Macromedia colfusion mx 7 Manual

Descargar
Página de 170
110
Chapter 9:  Indexing Collections with Verity Spider
Web standard support
Verity Spider supports key web standards used by Internet and intranet sites. Standard HREF 
links and frames pointers are recognized, so that navigation through them is supported. 
Redirected pages are followed so that the real underlying document is indexed. Verity Spider 
adheres to the robots exclusion standard specified in robots.txt files, so that administrators can 
maintain friendly visits to remote websites. HTTP Basic Authentication mechanism is supported 
so that password-protected sites can be indexed.
Restart capability
When an indexing job fails, or for some reason Verity Spider cannot index a significant number or 
type of URLs, you can now restart the indexing job to update the collection. Only those URLs 
that were not successfully indexed previously are processed.
State maintenance through a persistent store
Verity Spider stores the state of gathered and indexed URLs in a persistent store, which lets it 
track progress for the purposes of gracefully and efficiently restarting halted indexing jobs.
Performance
Spidering performance is greatly improved over previous versions, because of low memory 
requirements, flow control, and the help of multithreading and efficient Domain Name System 
(DNS) lookups.
Flow control
When indexing websites, Verity Spider distributes requests to web servers in a round-robin 
manner. This means that one URL is fetched from each web server in turn. With flow control, a 
faster website can finish before a slower one. The Verity Spider optimizes indexing on every web 
server.
Verity Spider adjusts the number of connections per server depending on the download 
bandwidth. When the download bandwidth from a web server falls below a certain value, Verity 
Spider automatically scales back the number of connections to that web server. There will always 
be at least one connection to a web server. When the download bandwidth increases to an 
acceptable level, Verity Spider reallocates connections (per the value of the 
-connections
 option, 
which is 4 by default). You can turn off flow control with the 
-noflowctrl
 option.
Multithreading
Verity Spider separates the gathering and indexing jobs into multiple threads for concurrence. 
Additionally, Verity Spider can create concurrent connections to web servers for fetching 
documents, and have concurrent indexing threads for maximum utilization. This translates to an 
overall improvement in throughput.