abbyy-software formreader ユーザーガイド

ページ / 29
What is a form? 
Questionnaires, social security forms, polling slips, warranty cards – all different types of form used to collect 
different types of information. How do forms differ from other types of documents?  
1.  A form has a set number of fields. 
2.  Field content is always determined by for example field name. E.g.  a “Last Name” field contains only last 
names (if completed correctly), a “Date” field only dates, etc. 
3.  During form processing, only the field contents are of interest; all remaining form elements are 
disregarded. 
Gathering information can be a long and weary process, involving the input of hundreds if not thousands of forms.  
ABBYY FormReader, however, makes life much easier, allowing the whole process to be automated. The inputting 
process then consists of the following stages: 
1.  Application setup – the form to be processed is specified.  
A form template is created within the program, containing the geometrical locations of the fields and 
specifying the type of information to be contained within them and containing other field parameters. 
2. Form 
processing. 
Completed forms are scanned and recognized (i.e. field images are converted into text) by the application. 
An existing template is used to identify form field positions and the type of information contained within 
them. Recognition results are subsequently verified and exported to a file or database. 
Easy? In theory, yes, in practice, no, as not all forms used to gather information are suitable for automated input.  
The aim of this guide is to explain exactly which requirements a form must meet if it is to be suitable for automated 
processing, and to show you how to create your own forms using Microsoft Visio 2000, Microsoft Word 2000, and 
Corel Draw. 
What is a machine-readable form? 
Two principal tasks are carried out during form recognition: 
1.  Locating fields
This is by no means an easy task as the scanned form image may be distorted in various ways e.g. stretched, 
skewed, or rotated. In order for these distortions to be corrected, the form must contain what are termed 
reference points. For more information on reference points and other form elements, see: “Elements of 
machine-readable forms“, page 6. 
2.  Separating field contents from field borders  
The information entered in the fields must be clearly separated from other form elements: field borders, 
background, service, and explanatory text. In order for the application to do this correctly, the form must meet 
certain requirements; these requirements specify several form types. For more information on form types, see: 
“Types of machine-readable forms“ (page 6). 
In order for the above two tasks to be carried out successfully, the forms must correspond to the form pattern 
exactly, 
i.e. forms of the same type must be printed using the same source document (pattern) so that the location of 
all form elements is identical on each one. If this is not the case, i.e. the location of fields on different copies of the 
form varies, the application will be unable to “find” the fields and, consequently, unable to recognize them. Copies 
of the form will only match the source document (pattern) by having the forms printed professionally. For more 
information regarding print quality, see:  “Print quality requirements“ (page 15). 
If the application is able to identify the field locations and separate the field contents from the field borders, the form 
in question is deemed to be machine-readable. From now on such forms are simply referred to as forms. 
Form completion methods 
A form may be completed in one of the following ways: 
1) by hand (“handprint” completion). Letters, digits and all other characters are written separately, with each 
character having its own individual character space.  
 
2) Using a matrix printer. 
3) Using a typewriter.  
4) Typographically. This refers to the use of inkjet and laser (not matrix) printers with a resolution of no less than 
300 dpi. 
 
5) Using a combination of the above.