Xerox DocuShare Support & Software Leaflet

Page of 10
Best Practices for Content Indexing
3
Document Content Indexing
Determine if the site requires indexing the text content of documents. Disabling content indexing 
will dramatically increase indexing performance and resource utilization. 
Document metadata is always indexed. Disabling content indexing does not affect metadata 
indexing.
Consider disabling content indexing if searching is targeted at metadata properties; such as Title, 
Subject, Keywords, Author, Description, etc. 
Go to Administration Menu l Services and Components l Index to disable/enable content 
indexing globally for the site. Content indexing is enabled by default.
MIME Types Configuration
Edit the MIME Types table at Administration Menu l Site Management l MIME Types to 
control indexing by MIME type. 
Set MIME Types Assignment Method to File Extension algorithm. Do not use the File 
Content algorithm unless uploaded document filenames do not include a file type extension 
such as .doc, .pdf, .txt, etc. 
Exclude from indexing document file types that do not require full text search, such as 
spreadsheets, binary files, images. Click Edit beside a MIME Type and exclude that document 
type from being indexed.
IDOL Server Configuration
Adjust indexing options to balance indexing performance with the types of documents to be 
content indexed, and which meet the requirements of the organization and end-users. To do so, 
use the following steps to either run the appropriate idoltool command or manually edit the IDOL 
Server configuration file located in DSHOME\IDOLServer\IDOL\AutonomyIDOLServer.cfg
Reduce memory usage for unstemmed terms
Located in the [Server] section of the AutonomyIDOLServer.cfg file, the 
UnstemmedTermTree parameter performs wildcard matching before stemming. 
A value of true performs wildcard matching before stemming takes place. With true, 
content.exe memory usage is higher because the server is now indexing large quantities 
of files that contain many numbers; such as Excel files.
A value of false does not store the unstemmed terms internally for spelling correction or 
pre-stem wildcard/fuzzy matching. With false, content.exe memory usage is lower. This 
reduces the chance of getting an out-of-memory error during indexing.
To change the value of UnstemmedTermTree, run: idoltool.bat -s setconfig idol 
Server.UnstemmedTermTree <value>
. The default value is true. Best results for wildcard 
searching is to keep the value set to true