Privacy & Security

PDF Metadata: What Hidden Information Are You Sharing?

Discover what hidden data lurks in PDF metadata. Learn what information PDFs reveal about you, how to view it, and how to remove sensitive metadata before sharing.

9 min read
#metadata#privacy#security#hidden data

Introduction: The Hidden Data in Every PDF

You carefully review your PDF before sharing it, ensuring the visible content is perfect. But what about the invisible data embedded in the file? PDF metadata can reveal far more than you realize β€” from your name and location to your editing history and software versions.

⚠️ Real-World Impact

In 2013, reporters used PDF metadata to reveal that a supposedly independent report was actually authored by a company with a vested interest. The "Author" field gave it away.

What is PDF Metadata?

Metadata is "data about data" β€” information describing the PDF file itself rather than its visible content. Think of it as the file's DNA: invisible to casual viewers but containing a wealth of information.

Types of PDF Metadata

πŸ“‹ Document Information

  • Title
  • Author
  • Subject
  • Keywords
  • Creator application
  • Producer (PDF converter)

⏰ Timestamps

  • Creation date & time
  • Modification date & time
  • Last printed date
  • Timezone information

πŸ’» Technical Details

  • PDF version
  • Page count
  • File size
  • Encryption settings
  • Embedded fonts list
  • Color profile

πŸ” Extended Metadata (XMP)

  • GPS coordinates (from photos)
  • Camera model (from scans)
  • Document history
  • Custom properties
  • Copyright information
  • License details

What Can Metadata Reveal About You?

1. Personal Identity

Example Metadata:

Author: John Smith

Company: Acme Corporation

Creator: Microsoft Word 2023 (Licensed to john.smith@acme.com)

What it reveals: Your full name, employer, email address, and the fact you have a licensed copy of Word.

2. Location Data

PDFs created from photos or scanned documents can contain GPS data:

GPS Latitude: 37.7749Β° N

GPS Longitude: 122.4194Β° W

(San Francisco, CA β€” precise to ~10 meters)

3. Editing History

Some PDF creators embed a complete revision history, revealing deleted content:

  • Previous versions of text
  • Deleted paragraphs
  • Original prices (before discount)
  • Redacted information (if improperly done)

🚨 Real Example:

In 2008, a government PDF accidentally revealed classified information that had been "deleted" but was still present in the document's metadata and revision history.

4. Software & System Information

Typical Software Fingerprint:

  • Creator: Adobe InDesign 2024 (19.4)
  • Producer: Adobe PDF Library 17.0
  • Operating System: Mac OS X 10.15.7
  • Printer: Canon imageCLASS MF743Cdw

What attackers learn: Software versions for targeted exploits, hardware for phishing, OS for social engineering.

5. Work Patterns & Timestamps

TimestampWhat It Reveals
Created: 2025-01-05 23:47:12 PSTYou work late (or are in a different timezone)
Modified: 2025-01-06 02:15:33 PSTYou revised at 2 AM (tight deadline?)
50+ modification timestampsDocument went through many revisions
Timezone: UTC+8Your approximate location (Asia-Pacific)

How to View PDF Metadata

Method 1: Adobe Acrobat Reader (Free)

  1. Open the PDF in Adobe Acrobat Reader
  2. Click File β†’ Properties (or press Ctrl+D / Cmd+D)
  3. Review tabs:
    • Description: Author, title, subject, keywords
    • Security: Encryption and permissions
    • Fonts: Embedded font information
    • Initial View: Default display settings
    • Custom: User-defined metadata fields

Method 2: Preview (Mac)

  1. Open PDF in Preview
  2. Click Tools β†’ Show Inspector (or press Cmd+I)
  3. Click the (i) tab for document info
  4. Look for "More Info" dropdown for extended metadata

Method 3: File Properties (Windows)

  1. Right-click the PDF file
  2. Select Properties
  3. Click the Details tab
  4. Scroll through metadata fields

Method 4: Command Line (Advanced)

Using exiftool (cross-platform):

# Install exiftool
# Mac: brew install exiftool
# Linux: sudo apt install libimage-exiftool-perl

# View all metadata
exiftool document.pdf

# View specific field
exiftool -Author -Creator document.pdf

Method 5: Online Metadata Viewers

⚠️ Security Warning

Avoid uploading sensitive PDFs to online metadata viewers. If you must use one, test with a non-confidential file first. Remember: once uploaded, you've shared that metadata with the service.

How to Remove or Sanitize PDF Metadata

Adobe Acrobat Pro (Commercial)

  1. Open PDF in Adobe Acrobat Pro
    • Go to File β†’ Properties
    • Edit or clear fields individually, OR
  2. Use the "Sanitize Document" tool:
    • Go to Tools β†’ Redact
    • Click Remove Hidden Information
    • Check what to remove
    • Click Remove

exiftool (Free, Command-line)

# Remove ALL metadata
exiftool -all:all= document.pdf

# Remove specific fields
exiftool -Author= -Creator= -Producer= document.pdf

# Clean and save as new file
exiftool -all:all= -o clean.pdf document.pdf

PDF-XChange Editor (Free Version Available)

  1. Open PDF in PDF-XChange Editor
    • Go to File β†’ Document Properties
    • Clear individual fields
    • Click OK and save

QPDF (Free, Command-line)

# Linearize and strip metadata
qpdf --linearize --object-streams=generate \
Β Β --stream-data=compress \
Β Β input.pdf output.pdf

Print to PDF (Quick & Dirty Method)

Quick Fix (with caveats):

  1. Open PDF in any viewer
  2. Print to PDF (File β†’ Print β†’ Save as PDF / Microsoft Print to PDF)
  3. The new PDF will have minimal metadata (creation date, producer)

⚠️ Drawback: May reduce quality, lose bookmarks, and remove form fields.

What Metadata Should You Keep?

FieldKeep?Reasoning
TitleYesHelpful for identification
AuthorMaybeUse generic (e.g., "HR Department") instead of personal name
SubjectYesUseful for searches
KeywordsYesImproves searchability
Creation DateMaybeKeep for archival, remove for anonymity
Modification DateNoReveals work patterns and editing frequency
Creator (Software)NoReveals software versions, potential security risks
ProducerNoTechnical detail, not needed
GPS CoordinatesNeverMajor privacy risk
Company NameMaybeKeep for official docs, remove for confidential sharing

Industry-Specific Metadata Concerns

Legal & Compliance

  • Risk: Attorney names, firm information, document versions revealing strategy changes
  • Best practice: Strip all metadata before sharing with opposing counsel or filing with courts

Healthcare (HIPAA)

  • Risk: Patient data in metadata, GPS from medical imaging scans
  • Best practice: Use HIPAA-compliant PDF tools that auto-sanitize metadata

Finance

  • Risk: Analyst names, internal system names, editing history of financial models
  • Best practice: Remove all technical metadata, keep only title and creation date

Government & Defense

  • Risk: Classification markings, author clearances, originating systems
  • Best practice: Use certified sanitization tools that meet government standards

Journalism & Activism

  • Risk: Source identity, location data, document provenance
  • Best practice: Complete metadata removal, use tools like MAT2 or Dangerzone

Advanced: Hidden Data Beyond Metadata

1. Deleted or "Invisible" Content

Sometimes content is "hidden" by covering it with white rectangles rather than truly deleting it:

  • Select All (Ctrl+A) can reveal hidden text
  • OCR tools can detect text under white boxes
  • Proper redaction requires using dedicated redaction tools, not just black rectangles

2. Embedded Files

PDFs can contain hidden attachments:

  • Source Word/Excel documents
  • Audio or video files
  • Other PDFs
  • Executable files (malware risk!)

How to Check:

Adobe Acrobat: View β†’ Show/Hide β†’ Navigation Panes β†’ Attachments

3. JavaScript & Actions

PDFs can contain JavaScript that runs when opened:

  • Tracking codes that "phone home"
  • Forms that submit data
  • Automatic printing or email actions

4. Layers & Optional Content

Design PDFs may have hidden layers:

  • Draft text below final version
  • Comments and annotations
  • Alternative images or graphics

Best Practices for Metadata Privacy

βœ… Before Sharing Any PDF Externally

  1. Review metadata using methods above
  2. Remove or sanitize sensitive fields
  3. Check for hidden content
  4. Use "Print to PDF" for maximum sanitization (if acceptable)
  5. Test the cleaned file before sending

πŸ’‘ Configure PDF Creation Defaults

Set your software to minimize metadata:

  • Microsoft Word: File β†’ Options β†’ Trust Center β†’ Privacy Options
  • Adobe Products: Preferences β†’ Documents β†’ Remove hidden information
  • macOS: System Preferences β†’ Security & Privacy β†’ Analytics β†’ Disable

πŸ” For Maximum Privacy

  • Use PDF tools that process files locally (like PDF Wonder Kit) rather than uploading to servers
  • Strip all metadata before sharing
  • Flatten all layers and transparency
  • Remove embedded files and attachments
  • Disable JavaScript if not needed

Conclusion: Metadata Matters

PDF metadata is like a digital fingerprint β€” it can reveal far more than you intend. Before sharing any PDF, especially sensitive documents, take a moment to review and clean its metadata.

Remember:

  • Visible content β‰  All content: Metadata is hidden but readable
  • Prevention is easier than cleanup: Configure software defaults
  • Different risks for different industries: Assess your specific needs
  • Tools exist to help: Use metadata viewers and sanitizers
  • Client-side processing is safer: Don't upload sensitive files to clean them

Process PDFs Without Uploading

PDF Wonder Kit processes all PDFs directly in your browser. No uploads means no metadata ever leaves your device β€” true privacy by design.

Try PDF Wonder Kit Free β†’

Ready to Get Started?

No software to install. No complicated steps. Just open your file, select what you need, and download. 100% free and private β€” your files never leave your device.