Xerox DocuShare Support & Software Leaflet
Best Practices for Content Indexing
3
Document Content Indexing
•
Determine if the site requires indexing the text content of documents. Disabling content indexing
will dramatically increase indexing performance and resource utilization.
will dramatically increase indexing performance and resource utilization.
•
Document metadata is always indexed. Disabling content indexing does not affect metadata
indexing.
indexing.
•
Consider disabling content indexing if searching is targeted at metadata properties; such as Title,
Subject, Keywords, Author, Description, etc.
Subject, Keywords, Author, Description, etc.
•
Go to Administration Menu l Services and Components l Index to disable/enable content
indexing globally for the site. Content indexing is enabled by default.
indexing globally for the site. Content indexing is enabled by default.
MIME Types Configuration
•
Edit the MIME Types table at Administration Menu l Site Management l MIME Types to
control indexing by MIME type.
control indexing by MIME type.
•
Set MIME Types Assignment Method to File Extension algorithm. Do not use the File
Content algorithm unless uploaded document filenames do not include a file type extension
such as .doc, .pdf, .txt, etc.
Content algorithm unless uploaded document filenames do not include a file type extension
such as .doc, .pdf, .txt, etc.
•
Exclude from indexing document file types that do not require full text search, such as
spreadsheets, binary files, images. Click Edit beside a MIME Type and exclude that document
type from being indexed.
spreadsheets, binary files, images. Click Edit beside a MIME Type and exclude that document
type from being indexed.
IDOL Server Configuration
•
Adjust indexing options to balance indexing performance with the types of documents to be
content indexed, and which meet the requirements of the organization and end-users. To do so,
use the following steps to either run the appropriate idoltool command or manually edit the IDOL
Server configuration file located in DSHOME\IDOLServer\IDOL\AutonomyIDOLServer.cfg.
content indexed, and which meet the requirements of the organization and end-users. To do so,
use the following steps to either run the appropriate idoltool command or manually edit the IDOL
Server configuration file located in DSHOME\IDOLServer\IDOL\AutonomyIDOLServer.cfg.
•
Reduce memory usage for unstemmed terms
•
•
Located in the [Server] section of the AutonomyIDOLServer.cfg file, the
UnstemmedTermTree parameter performs wildcard matching before stemming.
•
UnstemmedTermTree parameter performs wildcard matching before stemming.
•
A value of true performs wildcard matching before stemming takes place. With true,
content.exe memory usage is higher because the server is now indexing large quantities
of files that contain many numbers; such as Excel files.
content.exe memory usage is higher because the server is now indexing large quantities
of files that contain many numbers; such as Excel files.
•
A value of false does not store the unstemmed terms internally for spelling correction or
pre-stem wildcard/fuzzy matching. With false, content.exe memory usage is lower. This
reduces the chance of getting an out-of-memory error during indexing.
pre-stem wildcard/fuzzy matching. With false, content.exe memory usage is lower. This
reduces the chance of getting an out-of-memory error during indexing.
•
To change the value of UnstemmedTermTree, run: idoltool.bat -s setconfig idol
Server.UnstemmedTermTree <value>. The default value is true. Best results for wildcard
searching is to keep the value set to true.
Server.UnstemmedTermTree <value>. The default value is true. Best results for wildcard
searching is to keep the value set to true.