PDF Metadata: What Hidden Information Are You Sharing?
Discover what hidden data lurks in PDF metadata. Learn what information PDFs reveal about you, how to view it, and how to remove sensitive metadata before sharing.
Introduction: The Hidden Data in Every PDF
You carefully review your PDF before sharing it, ensuring the visible content is perfect. But what about the invisible data embedded in the file? PDF metadata can reveal far more than you realize β from your name and location to your editing history and software versions.
β οΈ Real-World Impact
In 2013, reporters used PDF metadata to reveal that a supposedly independent report was actually authored by a company with a vested interest. The "Author" field gave it away.
What is PDF Metadata?
Metadata is "data about data" β information describing the PDF file itself rather than its visible content. Think of it as the file's DNA: invisible to casual viewers but containing a wealth of information.
Types of PDF Metadata
π Document Information
- Title
- Author
- Subject
- Keywords
- Creator application
- Producer (PDF converter)
β° Timestamps
- Creation date & time
- Modification date & time
- Last printed date
- Timezone information
π» Technical Details
- PDF version
- Page count
- File size
- Encryption settings
- Embedded fonts list
- Color profile
π Extended Metadata (XMP)
- GPS coordinates (from photos)
- Camera model (from scans)
- Document history
- Custom properties
- Copyright information
- License details
What Can Metadata Reveal About You?
1. Personal Identity
Example Metadata:
Author: John Smith
Company: Acme Corporation
Creator: Microsoft Word 2023 (Licensed to john.smith@acme.com)
What it reveals: Your full name, employer, email address, and the fact you have a licensed copy of Word.
2. Location Data
PDFs created from photos or scanned documents can contain GPS data:
GPS Latitude: 37.7749Β° N
GPS Longitude: 122.4194Β° W
(San Francisco, CA β precise to ~10 meters)
3. Editing History
Some PDF creators embed a complete revision history, revealing deleted content:
- Previous versions of text
- Deleted paragraphs
- Original prices (before discount)
- Redacted information (if improperly done)
π¨ Real Example:
In 2008, a government PDF accidentally revealed classified information that had been "deleted" but was still present in the document's metadata and revision history.
4. Software & System Information
Typical Software Fingerprint:
- Creator: Adobe InDesign 2024 (19.4)
- Producer: Adobe PDF Library 17.0
- Operating System: Mac OS X 10.15.7
- Printer: Canon imageCLASS MF743Cdw
What attackers learn: Software versions for targeted exploits, hardware for phishing, OS for social engineering.
5. Work Patterns & Timestamps
| Timestamp | What It Reveals |
|---|---|
| Created: 2025-01-05 23:47:12 PST | You work late (or are in a different timezone) |
| Modified: 2025-01-06 02:15:33 PST | You revised at 2 AM (tight deadline?) |
| 50+ modification timestamps | Document went through many revisions |
| Timezone: UTC+8 | Your approximate location (Asia-Pacific) |
How to View PDF Metadata
Method 1: Adobe Acrobat Reader (Free)
- Open the PDF in Adobe Acrobat Reader
- Click File β Properties (or press Ctrl+D / Cmd+D)
- Review tabs:
- Description: Author, title, subject, keywords
- Security: Encryption and permissions
- Fonts: Embedded font information
- Initial View: Default display settings
- Custom: User-defined metadata fields
Method 2: Preview (Mac)
- Open PDF in Preview
- Click Tools β Show Inspector (or press Cmd+I)
- Click the (i) tab for document info
- Look for "More Info" dropdown for extended metadata
Method 3: File Properties (Windows)
- Right-click the PDF file
- Select Properties
- Click the Details tab
- Scroll through metadata fields
Method 4: Command Line (Advanced)
Using exiftool (cross-platform):
# Install exiftool
# Mac: brew install exiftool
# Linux: sudo apt install libimage-exiftool-perl
# View all metadata
exiftool document.pdf
# View specific field
exiftool -Author -Creator document.pdf
Method 5: Online Metadata Viewers
β οΈ Security Warning
Avoid uploading sensitive PDFs to online metadata viewers. If you must use one, test with a non-confidential file first. Remember: once uploaded, you've shared that metadata with the service.
How to Remove or Sanitize PDF Metadata
Adobe Acrobat Pro (Commercial)
- Open PDF in Adobe Acrobat Pro
- Go to File β Properties
- Edit or clear fields individually, OR
- Use the "Sanitize Document" tool:
- Go to Tools β Redact
- Click Remove Hidden Information
- Check what to remove
- Click Remove
exiftool (Free, Command-line)
# Remove ALL metadata
exiftool -all:all= document.pdf
# Remove specific fields
exiftool -Author= -Creator= -Producer= document.pdf
# Clean and save as new file
exiftool -all:all= -o clean.pdf document.pdf
PDF-XChange Editor (Free Version Available)
- Open PDF in PDF-XChange Editor
- Go to File β Document Properties
- Clear individual fields
- Click OK and save
QPDF (Free, Command-line)
# Linearize and strip metadata
qpdf --linearize --object-streams=generate \
Β Β --stream-data=compress \
Β Β input.pdf output.pdf
Print to PDF (Quick & Dirty Method)
Quick Fix (with caveats):
- Open PDF in any viewer
- Print to PDF (File β Print β Save as PDF / Microsoft Print to PDF)
- The new PDF will have minimal metadata (creation date, producer)
β οΈ Drawback: May reduce quality, lose bookmarks, and remove form fields.
What Metadata Should You Keep?
| Field | Keep? | Reasoning |
|---|---|---|
| Title | Yes | Helpful for identification |
| Author | Maybe | Use generic (e.g., "HR Department") instead of personal name |
| Subject | Yes | Useful for searches |
| Keywords | Yes | Improves searchability |
| Creation Date | Maybe | Keep for archival, remove for anonymity |
| Modification Date | No | Reveals work patterns and editing frequency |
| Creator (Software) | No | Reveals software versions, potential security risks |
| Producer | No | Technical detail, not needed |
| GPS Coordinates | Never | Major privacy risk |
| Company Name | Maybe | Keep for official docs, remove for confidential sharing |
Industry-Specific Metadata Concerns
Legal & Compliance
- Risk: Attorney names, firm information, document versions revealing strategy changes
- Best practice: Strip all metadata before sharing with opposing counsel or filing with courts
Healthcare (HIPAA)
- Risk: Patient data in metadata, GPS from medical imaging scans
- Best practice: Use HIPAA-compliant PDF tools that auto-sanitize metadata
Finance
- Risk: Analyst names, internal system names, editing history of financial models
- Best practice: Remove all technical metadata, keep only title and creation date
Government & Defense
- Risk: Classification markings, author clearances, originating systems
- Best practice: Use certified sanitization tools that meet government standards
Journalism & Activism
- Risk: Source identity, location data, document provenance
- Best practice: Complete metadata removal, use tools like MAT2 or Dangerzone
Advanced: Hidden Data Beyond Metadata
1. Deleted or "Invisible" Content
Sometimes content is "hidden" by covering it with white rectangles rather than truly deleting it:
- Select All (Ctrl+A) can reveal hidden text
- OCR tools can detect text under white boxes
- Proper redaction requires using dedicated redaction tools, not just black rectangles
2. Embedded Files
PDFs can contain hidden attachments:
- Source Word/Excel documents
- Audio or video files
- Other PDFs
- Executable files (malware risk!)
How to Check:
Adobe Acrobat: View β Show/Hide β Navigation Panes β Attachments
3. JavaScript & Actions
PDFs can contain JavaScript that runs when opened:
- Tracking codes that "phone home"
- Forms that submit data
- Automatic printing or email actions
4. Layers & Optional Content
Design PDFs may have hidden layers:
- Draft text below final version
- Comments and annotations
- Alternative images or graphics
Best Practices for Metadata Privacy
β Before Sharing Any PDF Externally
- Review metadata using methods above
- Remove or sanitize sensitive fields
- Check for hidden content
- Use "Print to PDF" for maximum sanitization (if acceptable)
- Test the cleaned file before sending
π‘ Configure PDF Creation Defaults
Set your software to minimize metadata:
- Microsoft Word: File β Options β Trust Center β Privacy Options
- Adobe Products: Preferences β Documents β Remove hidden information
- macOS: System Preferences β Security & Privacy β Analytics β Disable
π For Maximum Privacy
- Use PDF tools that process files locally (like PDF Wonder Kit) rather than uploading to servers
- Strip all metadata before sharing
- Flatten all layers and transparency
- Remove embedded files and attachments
- Disable JavaScript if not needed
Conclusion: Metadata Matters
PDF metadata is like a digital fingerprint β it can reveal far more than you intend. Before sharing any PDF, especially sensitive documents, take a moment to review and clean its metadata.
Remember:
- Visible content β All content: Metadata is hidden but readable
- Prevention is easier than cleanup: Configure software defaults
- Different risks for different industries: Assess your specific needs
- Tools exist to help: Use metadata viewers and sanitizers
- Client-side processing is safer: Don't upload sensitive files to clean them
Process PDFs Without Uploading
PDF Wonder Kit processes all PDFs directly in your browser. No uploads means no metadata ever leaves your device β true privacy by design.
Try PDF Wonder Kit Free βReady to Get Started?
No software to install. No complicated steps. Just open your file, select what you need, and download. 100% free and private β your files never leave your device.