What Is PDF Redaction? The Difference Between Real Redaction and a Black Box
True PDF redaction permanently removes sensitive text and images from a PDF. Learn why drawing a black rectangle is not redaction, and how to redact content properly.
PDF redaction is the permanent, irreversible removal of sensitive content — text, images, metadata — from a PDF document, replacing it with black (or white) rectangles. True redaction eliminates the underlying data. Drawing a black box over content is not redaction — the text remains in the file and is trivially recoverable. This distinction has caused real-world security breaches when organizations thought they had redacted sensitive information but actually hadn't.
Why Black Boxes Are Not Redaction
When you draw a black rectangle over text in a PDF editor, you're adding a graphical object on top of the text. The original text objects are still present in the content stream — the black box merely covers them visually. Anyone can: select all text and paste it into a text editor (the covered text is selected), use the PDF's text extraction, change the fill color of the rectangle to white (revealing the text), or examine the raw PDF content stream. Several high-profile government redaction failures worked exactly this way.
How True Redaction Works
Proper PDF redaction tools perform these operations:
- Mark content to be redacted (by selecting text, area, or pattern-matching)
- Apply the redaction: physically delete the marked content objects from the content stream, replacing them with filled rectangles that contain no underlying data
- Remove the text glyphs, their positioning operators, and any associated ToUnicode mappings from the content stream
- Strip document metadata that might contain versions of the deleted text
- Flatten any transparent layers that might contain redacted content
After proper redaction, the underlying text cannot be recovered because it no longer exists in the file.
Metadata Redaction
Text on the page is only one part of what needs redacting. PDFs can contain sensitive information in metadata: the document title, author name, and subject fields in DocInfo; XMP metadata including revision history and custom properties; embedded document properties from the original application (e.g., Word track changes saved in the PDF); and comments or annotations that contain sensitive information. Proper redaction workflows strip all metadata after content redaction.
Tools for Proper Redaction
Adobe Acrobat Pro has a dedicated Redact toolset that performs content removal properly (under Tools → Redact). It marks regions for redaction and then "applies" them, which modifies the content stream. It also has an option to sanitize the document after redaction, stripping metadata and other hidden content. For legal and government use, verify that your tool actually removes content rather than just covering it — test by selecting text over redacted areas after saving.
Testing Your Redaction
After saving a "redacted" PDF, always test it: try to select and copy text in the redacted areas; open the file in a text editor and search for strings that should have been removed; use File → Properties to check that document metadata has been cleared. If any of these tests reveals the supposedly redacted content, you have coverage — not redaction — and need to redo the process using a proper redaction tool.
Try Edit Pages Now — Free
Browser-based, private, and instant. No account or software required.
Open Edit Pages


