ABBYY Finereader XIX, RNW 422060070001025KN Leaflet

Product codes
422060070001025KN
Page of 2
ABBYY FineReader XIX is a special version of the award-winning FineReader optical character recognition (OCR) software
for recognising “fraktur” or “black letter” texts from the period between 1800 and 1938.  It is designed to convert scans of
old documents, books, and papers into text for the purpose of digital archiving and publishing, and it is the first omnifont
OCR software for Fraktur.
The Challenge:  Digitising Old Texts
Until recently, the limitations of technology and the
unique characteristics of text written in a variety of old-
fashioned fonts and scripts have made it difficult to auto-
mate the process of recording this information via com-
puter.  Sophisticated OCR dictionaries, language models
used for analysing and verifying text written during this
time period, have not existed. Computer systems capable
of reading old texts have required many hours of
systematic training to recognise fonts and characters that
are no longer used in modern printing. 
Black letter fonts, also known as “Gebrochene Schriften”
or broken scripts, first emerged in as early as the 12th
century, and evolved over the years to host a variety of
derivations and font types. The Fraktur typeface,
dominant in Germany, was created on behalf of the
German Emporer Maximilian and soon became popular
in many parts of Europe. Common characteristics and
peculiarities of the type include the elongated s and
ligatures, “joined” letters for certain letter combinations.
The frequency of its application makes understanding of
Fraktur essential for studying text and developing
recognition technologies for the period between 1800
and 1938.
ABBYY FineReader XIX is the first omnifont OCR for
Fraktur, giving users a solution for scanning and convert-
ing old documents with minimal training and dictionary
work.  This was achieved by combining extremely
intelligent technology with dedicated linguistic study:
OCR systems work by analysing a text image and making
a hypothesis about which letter or word an image
represents.  The hypotheses are analysed in context and
verified by use of sophisticated OCR dictionaries made up
of Language Models (LMs).  Language Models (LM) are
computer databases that describe the vocabulary of a
language.  The problem is that modern OCR systems do
not have LMs for older text fonts and older text spellings.
The solution for Fraktur text recognition was achieved
through the development of OCR dictionaries specifically
for this time period.  Special language models were
created for five European languages.
The Fraktur language models were created with the help
of ABBYY partner, ATAPY software.  Through develop-
ment process, 10 different dictionaries and more than
105 books published between 1808 and 1930 were
analysed.  Linguists reviewed word stock, identified
words that have phased out through the evolution of the
languages, and identified the correct paradigm assign-
ments for synchronising the language models with the
appropriate grammar usage for the time period.  More
than 500.000 word entries were manually compared
with existing FineReader dictionaries.  Grammatical
paradigms and word evolutions were reviewed to add
159 historic grammar paradigms that were missing from
the contemporary language models. Language models
were then compiled and tested on a control group of
testing documents featuring old text. 
To recognise the Fraktur style fonts, ABBYY development
teams created special classifiers, or alphabets, capable of
recognising the Fraktur symbols.  As part of this effort,
ABBYY development teams collected a symbol image
base with an average of 2500 symbol samples for each
symbol, a new alphabet pattern, and collected and input
a sample test base representing 31000 pages of text
from different sources.  Using the sample text, the
recognition engine was “fine-tuned” to work with the
subtle features of the Fraktur alphabet (such as the
ligatures, or connected letters). The new alphabet was
then added to the FineReader system and interface and
tested extensively.
The Solution:  First Omnifont OCR for Fraktur
ABBYY FineReader XIX was also developed with the
needs of universities and research center in mind.  The
product was developed through a cooperation with the
worldwide METAe Project.  METAe is a consortium of
libraries and digitisation companies from across Europe
who are working together to create the METAe Engine, a
software package specifically designed for organising the
work flow of the archiving and conversion of historical
materials such as books, journals, magazines and news-
papers. ABBYY FineReader XIX will provide a key compo-
nent for archiving some of Europe's most priceless
historical documents.  Partners in the METAe project
include:  the Univeristy of Innsbruck (Austria), University
of Florence (Italy) Bibliotéque Nationale de France, the
National Library of Norway, the Freiedrich-Ebert-
Foundation (Germany), CCS Compact Computer Systeme
(Germany), and Cornell Library University (USA).
Created in cooperation with major archiving institutions
First Omnifont OCR for Fraktur and Old European Language Recognition