PDF ExplainedApril 2, 20265 min read

What Are PDF Objects? Understanding the PDF Object Model

Everything in a PDF — text, images, fonts, pages — is a PDF object. Learn the seven object types, how they combine to form documents, and what this means for PDF tools.

A PDF file is fundamentally a collection of objects. Every element of a PDF — a page, a font, an image, a form field, a bookmark — is represented as one or more PDF objects linked together into a tree. Understanding the object model demystifies how PDFs work and why certain operations (like compression, editing, and repair) behave the way they do.

The Seven PDF Object Types

  • Boolean: true or false
  • Integer and Real: numeric values like page dimensions, coordinates, color components
  • String: text in parentheses (Hello) or hex <48656C6C6F>; used for text content and identifiers
  • Name: an identifier starting with / — e.g., /Type, /Page, /Font; all dictionary keys are names
  • Array: an ordered list of objects in square brackets: [0 0 595 842] (a MediaBox)
  • Dictionary: a collection of key-value pairs in double angle brackets: << /Type /Page /MediaBox [0 0 595 842] >>
  • Stream: a dictionary followed by a block of binary data (the stream body) — used for page content, font programs, image data, compressed objects
  • Null: null, representing absence of a value

Indirect Objects and Cross-References

Objects can be direct (inline within another object) or indirect (standalone, referenced by an object number and generation number: 5 0 obj). Indirect objects are listed in the cross-reference table with their byte offset, so any object can be located in O(1) time by seeking to its offset. Most significant objects (pages, fonts, images) are indirect — this is what allows PDF readers to jump to page 500 of a 1000-page document instantly.

How Objects Build a Document

The document structure is a tree of dictionaries: Catalog → Pages → Page → Resources → Font/XObject/ColorSpace → actual content. The Catalog is the root object identified in the file trailer. The Pages dictionary contains an array of Page dictionaries (or intermediate node dictionaries for large documents). Each Page dictionary has a Contents entry pointing to a content stream and a Resources entry pointing to the font, image, and other resource dictionaries used on that page. Content streams contain the actual drawing instructions using PDF operators.

Why the Object Model Matters for Tools

Understanding objects explains PDF tool behavior: Incremental updates — when Acrobat saves a changed PDF, it appends new versions of modified objects and a new xref section to the end of the file; the old objects remain but are superseded. This is why PDFs grow over time with repeated edits. Compression — PDF 1.5+ can group multiple objects into a single compressed object stream, dramatically reducing overhead for simple-value objects. Repair — when an xref table is corrupted, PDF readers can scan the file byte by byte looking for N 0 obj patterns to reconstruct the object map.

Try Compress PDF Now — Free

Browser-based, private, and instant. No account or software required.

Open Compress PDF
Report Bug
Send Feedback
Feature Request