PDF ExplainedApril 2, 20264 min read

What Is XMP Metadata in PDF?

XMP (Extensible Metadata Platform) is Adobe's XML-based metadata standard embedded in PDFs and images. Learn what XMP stores, how it differs from DocInfo, and how to read it.

XMP (Extensible Metadata Platform) is an ISO standard (ISO 16684) originally developed by Adobe for embedding metadata into digital media files — PDFs, images, audio, and video — in a machine-readable XML format. In PDFs, XMP is embedded as a stream near the beginning of the file and provides a richer, more extensible alternative to the older DocInfo dictionary.

XMP vs DocInfo Dictionary

The older DocInfo dictionary has fixed fields only: Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate. XMP can store all of these plus: revision history (who edited when), rights management information, color profile provenance, camera and scanning metadata (for PDFs from image workflows), custom organization-specific metadata in private namespaces, and Dublin Core descriptive metadata. When both DocInfo and XMP are present (as they usually are in modern PDFs), XMP takes precedence if they conflict.

XMP Namespaces in PDFs

XMP uses XML namespaces to organize metadata by category. Common namespaces in PDFs include: dc: (Dublin Core — title, creator, description, rights, language), xmp: (basic XMP — creation date, modify date, creator tool, metadata date), pdf: (PDF-specific — keywords, PDF version, producer), xmpMM: (media management — document ID, instance ID, history), and pdfuaid: (PDF/UA — accessibility conformance claims). Tools like ExifTool or the XMP Toolkit SDK can read and write all namespaces.

XMP and PDF/A

PDF/A requires XMP metadata, and specifically requires the PDF/A conformance level to be declared in the XMP via the pdfaid: namespace with pdfaid:part and pdfaid:conformance properties. Without these XMP fields, a document cannot claim PDF/A conformance regardless of its other properties. The XMP in PDF/A files also has a special restriction: a padding block of spaces must surround the XMP packet to allow in-place metadata updates without rewriting the file.

Reading XMP Metadata

XMP is human-readable XML. In a PDF, it's stored as a stream object with /Type /Metadata and /Subtype /XML. You can extract it by: opening the PDF in a text editor and searching for <?xpacket; using ExifTool (exiftool -xmp -b file.pdf); using Adobe Bridge or Acrobat's File Info dialog; or using Python with the pypdf library's metadata extraction methods. The raw XML is typically found in the first few KB of the file.

Try Edit Pages Now — Free

Browser-based, private, and instant. No account or software required.

Open Edit Pages
Report Bug
Send Feedback
Feature Request