OPSEC Scrub: The Silent Proxy
for Metadata Sanitization

A four-layer architecture that operates at your network edge, stripping 47+ metadata fields from every outbound file — automatically, silently, and with a complete audit trail.

200+ File Formats47+ Metadata FieldsSub-5ms LatencyUK-Sovereign Processing

Four-Layer Silent Proxy Architecture

Each layer is independently scalable and auditable. The entire pipeline completes in under 50ms for most document types.

Layer 01

Silent Proxy Layer

Network Edge Interception

Deployed as a transparent network proxy, OPSEC Scrub intercepts all outbound file transfers at the perimeter. No endpoint agents required. No user-facing prompts. Supports HTTP, HTTPS, SMTP, FTP, and cloud storage APIs.

Zero-touch deployment
Protocol-agnostic interception
Sub-5ms latency overhead
TLS inspection with certificate pinning
Layer 02

AI Classification Engine

Document Intelligence

A fine-tuned transformer model classifies each file by type and sensitivity level before sanitization begins. Supports 200+ file formats including PDF, DOCX, XLSX, PPTX, JPEG, PNG, MP4, and ZIP archives.

200+ file format support
Sensitivity classification
PII detection (NER model)
Malware pre-screening
Layer 03

Sanitization Engine

47+ Field Removal

The core sanitization pipeline strips 47+ metadata categories using format-specific parsers. Each parser is maintained against the latest format specifications to ensure complete coverage as standards evolve.

Format-native parsing
Deep structure traversal
Embedded object recursion
Binary-level verification
Layer 04

Audit Log System

Immutable Evidence Trail

Every sanitization event generates a cryptographically signed log entry stored in an append-only database. Logs include: file hash (before/after), fields removed, timestamp, user context, and destination.

Cryptographic signing (SHA-256)
Append-only storage
SIEM integration (Splunk, QRadar)
Exportable compliance reports

Machine learning that reads what humans miss.

Standard metadata strippers miss embedded PII in document body text, image captions, and custom XML properties. Our NER (Named Entity Recognition) model identifies and redacts personal data that rule-based systems cannot detect.

The model is trained on UK-specific data patterns including NHS numbers, NI numbers, UK postcodes, and Companies House identifiers.

Named Entity Recognition
Names, addresses, phone numbers, emails
UK-Specific Pattern Matching
NI numbers, NHS IDs, postcodes, CRNs
Context-Aware Redaction
Preserves document meaning while removing PII
Confidence Scoring
Each redaction logged with confidence percentage
OPSEC Scrub AI sanitization

47+ Metadata Fields Stripped

Every field that could identify your people, your infrastructure, or your processes.

GPS coordinates (EXIF)
Author name & initials
Last modified by
Document revision history
Creation timestamp
Software version
Device name & model
Operating system
Company name (properties)
Manager field
Custom XML properties
Embedded thumbnails
Tracked changes
Hidden comments
Linked data sources
Template path
Macro references
Digital watermarks
Printer settings
Font embedding data
Language identifiers
User SID references
Network path references
Email addresses
Phone numbers (embedded)
Postal codes in metadata
IP addresses
Serial numbers
License keys
Build paths
Database connection strings
API endpoint references
Internal URLs
Colour profiles with device IDs
Camera make/model
Lens data
Flash settings
ISO/shutter metadata
Geotag altitude
Subject distance
White balance settings
Metering mode
Exposure programme
Focal length
Aperture value
Brightness value
Light source
Scene type

200+ File Formats Supported

Documents

PDFDOCXXLSXPPTXODTRTFTXT

Images

JPEGPNGTIFFHEICWebPBMPGIF

Archives

ZIPRAR7ZTARGZ

Media

MP4MOVAVIMP3WAV

Code/Data

JSONXMLCSVHTMLSVG

Ready to see it in action?

Upload a sample file and receive a free Metadata Risk Audit. No account required.