Macromedia colfusion mx 7 Manual

Descargar
Página de 170
120
Chapter 9:  Indexing Collections with Verity Spider
If a halt occurs during indexing, the chunk of documents specified by the 
-submitsize
 option is 
lost because there is no transactional rollback for indexing and the documents are no longer in the 
queue for indexing. When you rerun the indexing task, Verity Spider can only continue with 
URLs and documents that are enqueued.
-temp
Syntax
-temp path
Specifies the directory for temporary files (disk cache). By default, the temp directory is under the 
job directory (optionally specified with the 
-jobpath
 option). 
If you do not specify a value for this option, Verity Spider creates a /spider/temp directory within 
the collection. For multiple-collection tasks, the first collection specified is used.
Note: Make sure the location you specify contains enough disk space to handle the documents that 
are downloaded and held before indexing. The documents are deleted from the hard disk after they 
are indexed.
See also 
, for specifying the location of all indexing job directories and files, one of 
which is the temp directory.
Networking options
The following sections describe the Verity Spider networking options.
-agentname
Type: Web crawling only 
Syntax
-agentname string
Specifies the value for the agent name field that is part of the HTTP request. Since web servers 
can be configured to return different versions of the same page depending on the requesting 
agent, you can use the 
-agentname
 option to impersonate a browser client.
Use double-quotation marks if the name contains a space. Use the 
-cmdfile
 option if the agent 
name you want to use contains forbidden characters, such as slashes or backslashes.
-connections
Syntax
-connections num_connections
Specifies the maximum number of simultaneous socket connections to make to websites for 
indexing. Each connection implies a separate thread.
The default value is 6.
Note: The Verity Spider dynamic flow control makes the most use of all available connections when 
indexing websites. If you are indexing multiple sites, you might want to increase this number. 
Increasing the number of connections does not always help, because of such dependencies as your 
network connection and the capabilities of the remote hosts.