Adobe acrobat 7.0.5 sdk User Manual

Creating PDF Documents

Creating Tagged PDF Documents

O T E

Server use of the Distiller software is not allowed. The End User License Agreement
allows for use only on a single system. Access to, and use of, the Distiller software
over a network is prohibited. The only exception is installation of the software. You
are permitted to keep a copy of the software on a server so that users who have a
license for the software can download and install it. A separate product, Acrobat
Distiller Server, can be purchased or licensed from Adobe for server use.

Creating Tagged PDF Documents

PDF files are well known for representing the physical layout of a document; that is, the
page markings that comprise the page contents. In addition, PDF versions 1.3 and beyond
provide a mechanism for describing logical structure in PDF files. This includes information
such as the organization of the document into chapters and sections, as well as figures,
tables, and footnotes.

PDF 1.4 and Acrobat 5 introduced tagged PDF, which is a particular use of structured PDF
that allows page content to be extracted and used for various purposes, including:

●

Reflow of text and graphics

●

Conversion to file formats such as HTML and XML

●

Access for the visually impaired (see

Chapter 14, “Accessibility”

PDF Logical Structure

PDF logical structure is layered on top of a document’s page contents using a special
markup language. HTML and XML use a similar layout for logical structure: text enclosed in
a hierarchy of tags. In HTML, each component is wrapped with a set of tags that define its
structure. For example, the text of a top-level header begins with a <h1> tag and ends with
a </h1> tag. PDF provides similar constructs with its marked content operators.

In fact, HTML logical structure can be preserved in a PDF document. The Web Capture
feature introduced in Acrobat 4.0 allows converting HTML to PDF. Such PDF may optionally
contain structure information from the HTML data. Acrobat can generate bookmarks from
this structure data.

The Structure Tree

Logical structure is independent of, though related to, the page content (that is, the actual
marks on the page made by the marking operators).

In a PDF document, logical structure is represented by a tree of elements called a structure
tree. There are pointers from the logical structure to the page contents, and vice versa. The
structure tree provides additional capability to navigate, search, and extract data from PDF
documents. By accessing a PDF document via its structure tree, for instance, you can obtain
logically ordered content independently of the drawing order of the page contents.