Adobe acrobat 7.0.5 sdk User Manual

Page of 122
43
Creating PDF Documents
Creating Tagged PDF Documents
4
N
O T E
:
Server use of the Distiller software is not allowed. The End User License Agreement 
allows for use only on a single system. Access to, and use of, the Distiller software 
over a network is prohibited. The only exception is installation of the software. You 
are permitted to keep a copy of the software on a server so that users who have a 
license for the software can download and install it. A separate product, Acrobat 
Distiller Server, can be purchased or licensed from Adobe for server use. 
Creating Tagged PDF Documents
PDF files are well known for representing the physical layout of a document; that is, the 
page markings that comprise the page contents. In addition, PDF versions 1.3 and beyond 
provide a mechanism for describing logical structure in PDF files. This includes information 
such as the organization of the document into chapters and sections, as well as figures, 
tables, and footnotes.
PDF 1.4 and Acrobat 5 introduced tagged PDF, which is a particular use of structured PDF 
that allows page content to be extracted and used for various purposes, including:
Reflow of text and graphics
Conversion to file formats such as HTML and XML
Access for the visually impaired (see 
).
PDF Logical Structure
PDF logical structure is layered on top of a document’s page contents using a special 
markup language. HTML and XML use a similar layout for logical structure: text enclosed in 
a hierarchy of tags. In HTML, each component is wrapped with a set of tags that define its 
structure. For example, the text of a top-level header begins with a <h1> tag and ends with 
</h1> tag. PDF provides similar constructs with its marked content operators.
In fact, HTML logical structure can be preserved in a PDF document. The Web Capture 
feature introduced in Acrobat 4.0 allows converting HTML to PDF. Such PDF may optionally 
contain structure information from the HTML data. Acrobat can generate bookmarks from 
this structure data.
The Structure Tree
Logical structure is independent of, though related to, the page content (that is, the actual 
marks on the page made by the marking operators).
In a PDF document, logical structure is represented by a tree of elements called a structure 
tree
. There are pointers from the logical structure to the page contents, and vice versa. The 
structure tree provides additional capability to navigate, search, and extract data from PDF 
documents. By accessing a PDF document via its structure tree, for instance, you can obtain 
logically ordered content independently of the drawing order of the page contents.