What Is a PDF Cross-Reference Table? xref Explained
The PDF cross-reference table (xref) is the index that lets PDF readers jump to any object instantly. Learn how it works, why corrupted xref causes errors, and about xref streams.
The cross-reference table (xref table) is the index at or near the end of a PDF file that maps every object number to its byte offset within the file. Without the xref table, a PDF reader would have to parse the entire file linearly to find each object. With it, the reader can jump directly to any object in the file in O(1) time — which is why large PDFs open quickly and random page access is instant.
Structure of the xref Table
The traditional xref table consists of: (1) the keyword xref, (2) one or more subsections, each starting with a first object number and object count, followed by (3) fixed-width 20-byte entries, one per object: a 10-digit byte offset, a 5-digit generation number, and a status flag (n for in-use, f for free/deleted). After the xref table is the trailer dictionary specifying the total object count, the byte offset of the previous xref, and a pointer to the catalog. The final line of the file is startxref followed by the byte offset of the xref table.
Incremental Updates and Multiple xref Tables
When Acrobat saves changes to a PDF without rewriting the entire file (the default behavior), it appends new/updated objects to the end of the file followed by a new xref section covering only the changed objects. This new xref section's /Prev entry points to the previous xref. A PDF modified many times may have a chain of xref sections; the reader builds the complete object map by following the chain from latest to oldest, with newer entries overriding older ones. This is why PDFs grow over time — old object data is not removed, just superseded.
xref Streams (PDF 1.5+)
PDF 1.5 introduced xref streams as an alternative to the traditional xref table. An xref stream is a regular PDF stream object with /Type /XRef containing compressed cross-reference data. Benefits: the xref data can be Flate-compressed (traditional xref tables can't be compressed, typically taking 20 bytes per object), and object streams (groups of compressed objects) can only be referenced via xref streams. Modern PDFs with many objects can save significant space by using xref streams. The trade-off: very old readers don't support xref streams, though this is no longer a practical concern.
xref Corruption and Recovery
If the xref table is corrupted — truncated download, disk error, interrupted save — the PDF reader cannot locate objects and reports an error. Recovery options: most PDF readers attempt to rebuild the xref by scanning the file for N 0 obj patterns when the xref fails (Adobe Reader calls this "repairing" the document). The qpdf --check command can diagnose xref issues. Ghostscript can often read and re-output PDFs with damaged xref tables. If a PDF opens but content is missing, a partial xref corruption may be the cause.
Try Compress PDF Now — Free
Browser-based, private, and instant. No account or software required.
Open Compress PDF


