Nuance scansoft omnipage pro 14 User Manual

Page of 124
Chapter 4
Languages
69
Languages
The program can read over 110 languages with three alphabets: Latin, 
Greek and Cyrillic. See the list in the OCR panel of the Options dialog 
box. It shows which languages have dictionary support. A listing is also 
provided on the ScanSoft web site.
In addition to user dictionaries, specialized dictionaries are available for 
certain professions (currently medical, legal and financial) for some 
languages. See the list and make selections in the OCR panel of the 
Options dialog box.
Training
Training is the process of changing the OCR solutions assigned to 
character shapes in the image. It is useful for uniformly degraded 
documents or when an unusual typeface is used throughout a document. 
Training will be less useful for texts with random distortions. Here is an 
example, based on the letter “g”, which can be printed in different ways:
The first two examples do not need training, because both shapes are 
normal for the letter “g” and the program can handle them. The third 
example could benefit from training because the shape of “g” is unusual, 
and all instances of “g” in the text are likely to look like this. The fourth 
example is not good for training, because the first “g” is poorly printed, 
and this shape is unlikely to appear again in the document.
The program identifies the language of recognized texts and displays it in the status 
bar. This language marking is exported with the document. Use Set Language... in 
the Tools menu to change the language marking for selected text. This does not 
change the recognition language(s).