Xerox DocuShare Support & Software Merkblatt
Best Practices for Content Indexing
1
Best Practices for Content Indexing
Recommended for Sites Containing Over 500,000 Documents
The intent of this document is to provide guidance for improving DocuShare 6.6.1 search indexing
performance. These instructions should be included in the planning of any new or upgraded 6.6.1
server.
performance. These instructions should be included in the planning of any new or upgraded 6.6.1
server.
Note:
Refer to the DocuShare 6.6.1 Command Line Utilities Guide for more information on running the
commands called-out in this document.
Upgrading from releases prior to 6.5
Note:
Perform these procedures after upgrading to 6.6.1 and before running dsindex index_all.
Consider the needs and usage of the site, then complete any of following commands after upgrading
to 6.6.1.
to 6.6.1.
•
DSHOME\bin\countFileNumber
•
•
Run this command to gather metrics on the number of documents and file sizes currently on
the site. This command identifies large files that will take longer to content index.
the site. This command identifies large files that will take longer to content index.
•
DSHOME\bin\fixupMimeTypes
•
•
Run this command to find files that may have been incorrectly identified with a text file
MIME type.
MIME type.
This command reports document and rendition objects where the MIME types assigned by
the file guesser do not match the MIME type file extension. Earlier versions of DocuShare had
the MIME type configuration set by default to the File Content algorithm (file type guesser)
instead of the File Extension algorithm. The fixupMimeTypes command identifies documents
such as *.dat files (containing many rows of numeric strings, commas, and other special
characters) that may have been identified as .txt files in an earlier version of DocuShare.
the file guesser do not match the MIME type file extension. Earlier versions of DocuShare had
the MIME type configuration set by default to the File Content algorithm (file type guesser)
instead of the File Extension algorithm. The fixupMimeTypes command identifies documents
such as *.dat files (containing many rows of numeric strings, commas, and other special
characters) that may have been identified as .txt files in an earlier version of DocuShare.
•
DSHOME\bin\fixupMimeTypes -f
•
•
Run this command to fix the incorrectly assigned document MIME type to match the file
extension. Documents will be indexed according to the MIME type file extensions in the
MIME Types table.
extension. Documents will be indexed according to the MIME type file extensions in the
MIME Types table.