top of page

Automatically Detect and Redact Personally Identifiable Information From Documents Using Health Data Avatar

  • Writer: Maria Sergeeva
    Maria Sergeeva
  • Feb 10
  • 3 min read

Handling sensitive documents shouldn’t require hours of manual work or Googling "data anonimisation"/ AI-powered file redaction / redact personal information in a file for free, only to discover that most tools only offer enterprise-grade solutions. It's essential to be able to prepare personal files for AI to ensure privacy as much as possible, that’s why we’ve launched personal data redaction as part of Health Data Avatar (HDA) — and why it’s quickly become our early testers’ favourite feature.


With one click, you can now automatically detect personally identifiable information from your selected file, regardless the format, and choose what to remove and what should remain, staying in control.


Why PII Redaction Is Still Broken

Whether you’re dealing with medical records, insurance letters, legal documents, or cross-border paperwork, most files contain sensitive identifiers such as:

  • Full name

  • Date of birth

  • Address

  • Phone number

  • Email

  • NHS / national ID / social security / account numbers

  • Policy, patient, or reference numbers


Traditionally, data redaction software that handles this properly is sold as a costly enterprise feature. Individuals and small organisations are often left with poor alternatives: manual blacking-out, PDF hacks, or tools that permanently destroy information with no audit trail.

That’s deeply inefficient and increasingly unnecessary.


One-Click File Redaction Inside Health Data Avatar


With HDA, removing personal information from files is now simple, reversible, and precise.


Once your documents are uploaded and securely stored in your Health Data Avatar account, you can:


  • Automatically detect PII from documents

  • Apply redaction across individual files or your entire document set

  • Export files in markup format, preserving structure and context, making it ideal for LLMs

  • Choose exactly which identifiers are sanitised


This means you can safely prepare documents for:

  • AI analysis

  • Sharing with third parties

  • Cross-border healthcare or insurance use

  • Research, audits, or second opinions

  • Even posting a video about your personal data (that's how we use it too :))

—all without exposing more data than necessary.


Selective Redaction: You Stay in Control


Selective Redaction of PII with HDA

Not all “identifiable information” is unwanted in every context. Having to remove it automatically is what we personally dislike. For example, sometimes, the date of appointment is really useful for the context.


That’s why HDA’s file redaction feature is fully configurable. Before exporting, users can:

  • See which categories of personal data are detected

  • Untick any fields they want to keep

  • Redact only what’s required for a specific use case


This avoids the common problem of over-redaction, where important clinical or contextual information is lost.

Privacy without blunt instruments.


From Enterprise-Only to Accessible by Design

PII redaction has historically been locked behind enterprise pricing and compliance tooling. We deliberately took a different approach.


In HDA:

  • Personal identifiable information redaction is available to all users

  • No enterprise contract required

  • No irreversible file destruction

  • No data leaving your secure environment


Redaction is now offered for free for our early testers and will soon be available to every user for a small per-export fee, reflecting actual compute usage, not artificial feature gating.


Translate, Summarise, Prepapare for Appointments, Insurers and LLMs


Manual redaction fails at scale. Humans miss things. Files are inconsistent. Formats vary. Languages differ.

HDA’s approach allows us to:


  • Detect identifiers across multiple document types

  • Work across languages and formats

  • Apply consistent redaction rules across large datasets

  • Reduce human error without removing human control

This is especially critical for health data, where privacy, accuracy, and context all matter.


Designed for Real-World Use Cases


Our users are already using AI redaction to:

  • Share medical records with LLMs without exposing identities

  • Prepare documents for international providers

  • Remove personal data before research or second opinions

  • Sanitize files before sharing with insurers, lawyers, or platforms

  • Create AI-ready datasets without leaking identifiers

All while keeping the original files securely stored and unchanged.


Redaction Is Not Just About Privacy — It’s About Control

Removing personal information from files isn’t only a compliance exercise. It’s about deciding:

  • Who gets access to what

  • In what context

  • For what purpose


Health Data Avatar treats data redaction software as part of a broader principle: minimum necessary disclosure, applied intelligently.


Try PII Redaction in Health Data Avatar for free


If you’re tired of clunky tools, irreversible black boxes, or enterprise-only pricing, HDA’s redaction feature was built for you. Try HDA's PII file redaction for free, while we're still testing the feature across languages.

 
 
 

Comments


Want Our News Delivered To You?

Get HDA Updates. No Spam. Unsubscribe At Any Time.

bottom of page