Инструкции По Обслуживанию для Nuance omnipage pro 6

Скачать
Страница из 255
Basic OmniPage OCR Technologies
Understanding OCR 252
each character can be infinitely tuned and re-tuned as new fonts or new 
problems come up.
If there is a problem with “c”s and “e”s, additional tuning of those two 
experts is done until that one problem is resolved. To recognize a foreign 
language that has an “ä” as well as an “a,” another expert to identify the 
new character is added. This expert approach to recognition is what 
allows OmniPage to recognize more languages (13) than any other OCR 
package. It is this expert approach that provides OmniPage with so 
remarkably few substitution errors.
The inherent accuracy of the algorithm has always been the most 
important parameter at Caere. Experts provide that; however, they have 
two down sides. One is that they are remarkably difficult and time-
intensive to program. Such an approach would be incredibly accurate for 
Kanji, but would take several hundred man-years to program the 5000 
character experts needed! Machine-learned database probability pools or 
neural nets are the most practical approaches for such a language.
Self-Learning OCR
The other downside of experts is that they are very computer intensive, 
and therefore somewhat slow. One of the patents pending for Caere has to 
do with an accelerating, self-learning routine which allows each unique 
character to only have to be recognized once. From then on, the system 
will identify it as another “a” or “b” without having to reanalyze it each 
time with the experts. This accelerator technique makes OmniPage 
actually speed up as it reads a document. This technology, operating in 
true 32-bit mode, makes AnyFont the fastest omnifont OCR algorithms in 
the world, with speeds of up to 4000 words per minute attainable on the 
faster PCs.
Sometimes none of the experts are able to identify the character. This can 
happen with broken or overlapping characters. This is solved by 
AnyFont’s second pass. It can be seen on the screen as the light blue areas 
of the document image are painted a darker blue. The characters, or pieces 
of characters, that get past the experts are put in a separate buffer to be 
dealt with later. A series of very sophisticated routines come into play for 
splitting, combining, fragment analysis, fatting, thinning, and context 
checking. The quality and sophistication of these second pass routines 
provide greater recognition accuracy, even for very difficult problem 
characters and parts of characters. A third pass allows the Language 
Analyst to further refine accuracy.