Automatically Detect and Redact Personally Identifiable Information From Documents Using Health Data Avatar
- Maria Sergeeva

- Feb 10
- 3 min read
Handling sensitive documents shouldn’t require hours of manual work or Googling "data anonimisation"/ AI-powered file redaction / redact personal information in a file for free, only to discover that most tools only offer enterprise-grade solutions. It's essential to be able to prepare personal files for AI to ensure privacy as much as possible, that’s why we’ve launched personal data redaction as part of Health Data Avatar (HDA) — and why it’s quickly become our early testers’ favourite feature.
With one click, you can now automatically detect personally identifiable information from your selected file, regardless the format, and choose what to remove and what should remain, staying in control.
Why PII Redaction Is Still Broken
Whether you’re dealing with medical records, insurance letters, legal documents, or cross-border paperwork, most files contain sensitive identifiers such as:
Full name
Date of birth
Address
Phone number
Email
NHS / national ID / social security / account numbers
Policy, patient, or reference numbers
Traditionally, data redaction software that handles this properly is sold as a costly enterprise feature. Individuals and small organisations are often left with poor alternatives: manual blacking-out, PDF hacks, or tools that permanently destroy information with no audit trail.
That’s deeply inefficient and increasingly unnecessary.
One-Click File Redaction Inside Health Data Avatar
With HDA, removing personal information from files is now simple, reversible, and precise.
Once your documents are uploaded and securely stored in your Health Data Avatar account, you can:
Automatically detect PII from documents
Apply redaction across individual files or your entire document set
Export files in markup format, preserving structure and context, making it ideal for LLMs
Choose exactly which identifiers are sanitised
This means you can safely prepare documents for:
AI analysis
Sharing with third parties
Cross-border healthcare or insurance use
Research, audits, or second opinions
Even posting a video about your personal data (that's how we use it too :))
—all without exposing more data than necessary.
Selective Redaction: You Stay in Control

Not all “identifiable information” is unwanted in every context. Having to remove it automatically is what we personally dislike. For example, sometimes, the date of appointment is really useful for the context.
That’s why HDA’s file redaction feature is fully configurable. Before exporting, users can:
See which categories of personal data are detected
Untick any fields they want to keep
Redact only what’s required for a specific use case
This avoids the common problem of over-redaction, where important clinical or contextual information is lost.
Privacy without blunt instruments.
From Enterprise-Only to Accessible by Design
PII redaction has historically been locked behind enterprise pricing and compliance tooling. We deliberately took a different approach.
In HDA:
Personal identifiable information redaction is available to all users
No enterprise contract required
No irreversible file destruction
No data leaving your secure environment
Redaction is now offered for free for our early testers and will soon be available to every user for a small per-export fee, reflecting actual compute usage, not artificial feature gating.
Translate, Summarise, Prepapare for Appointments, Insurers and LLMs
Manual redaction fails at scale. Humans miss things. Files are inconsistent. Formats vary. Languages differ.
HDA’s approach allows us to:
Detect identifiers across multiple document types
Work across languages and formats
Apply consistent redaction rules across large datasets
Reduce human error without removing human control
This is especially critical for health data, where privacy, accuracy, and context all matter.
Designed for Real-World Use Cases
Our users are already using AI redaction to:
Share medical records with LLMs without exposing identities
Prepare documents for international providers
Remove personal data before research or second opinions
Sanitize files before sharing with insurers, lawyers, or platforms
Create AI-ready datasets without leaking identifiers
All while keeping the original files securely stored and unchanged.
Redaction Is Not Just About Privacy — It’s About Control
Removing personal information from files isn’t only a compliance exercise. It’s about deciding:
Who gets access to what
In what context
For what purpose
Health Data Avatar treats data redaction software as part of a broader principle: minimum necessary disclosure, applied intelligently.
Try PII Redaction in Health Data Avatar for free
If you’re tired of clunky tools, irreversible black boxes, or enterprise-only pricing, HDA’s redaction feature was built for you. Try HDA's PII file redaction for free, while we're still testing the feature across languages.

Comments