OpenAI has released Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. It’s 1.5 billion parameters (50 million active), licensed under Apache 2.0, and it runs on device. Your data never leaves your machine.
🔍 THE BOTTOM LINE: The company that built its empire on processing your data just gave away a tool that stops you from sending it. That’s either genuine progress or the most strategic open-source release of the year.
🔎 What It Does
Privacy Filter is a bidirectional token classifier. It reads your text, identifies every piece of personally identifiable information, and masks it before the data goes anywhere.
What it detects:
- Names, addresses, phone numbers
- Email addresses and usernames
- Social security and ID numbers
- Credit card and bank details
- Medical records and health data
- IP addresses and device identifiers
- Date of birth, age, and biometric data
The full taxonomy covers V4 and V7 PII categories — the international standards used by GDPR and privacy regulators worldwide.
Technical specs:
- 1.5B parameters (50M active via sparse architecture)
- 128,000-token context window
- Bidirectional token classification
- Runs locally — no API call needed
- Apache 2.0 license — free for commercial use
🛡️ Why This Matters
Every day, millions of people paste things into ChatGPT they shouldn’t. Tax returns. Medical records. Client emails. Legal documents. Most of that data gets processed in the cloud, stored in logs, and potentially used for training.
Privacy Filter sits between you and the cloud. It sanitises the data first. Then you send the sanitised version.
For organisations, this solves a real compliance problem:
- GDPR — You can demonstrate data minimisation before processing
- NZ Privacy Act 2020 — You’re not sending personal information to third parties without cause
- HIPAA — Health data can be stripped before any AI touchpoint
- Industry-specific regulations — Financial, legal, and government data can be pre-screened
🔧 How to Use It
The model is available on Hugging Face as openai/privacy-filter:
from transformers import AutoModelForTokenClassification, AutoTokenizer
model = AutoModelForTokenClassification.from_pretrained("openai/privacy-filter")
tokenizer = AutoTokenizer.from_pretrained("openai/privacy-filter")
It integrates into any data pipeline:
- Pre-processing for AI tools — Sanitise prompts before sending to any LLM
- Data pipeline cleaning — Strip PII from training datasets
- Compliance workflows — Automated PII detection for regulatory audits
- ChatGPT safety layer — Use locally before any cloud API call
🇳🇿 NZ Relevance
New Zealand’s Privacy Act 2020 Principle 1 requires that personal information is collected only for a lawful purpose. Principle 5 requires that you tell people what you’re collecting and why.
If your organisation uses ChatGPT, Claude, or any cloud AI tool, you may be sending personal information offshore without adequate disclosure. Privacy Filter gives you a local sanitisation layer — the AI never sees the raw data, only the masked version.
For NZ businesses handling customer data, this is a practical compliance tool, not just a nice-to-have.
⚠️ The Irony
The company that processes more personal data through AI than almost anyone else just released a tool to help you keep your data private from AI companies. Including them.
OpenAI’s own blog frames this as supporting “safe and responsible AI deployment.” That’s true. It’s also true that releasing a free PII scrubber makes it harder for regulators to argue that AI companies can’t handle privacy — because now there’s an open-source solution anyone can deploy.
Whether this is genuine progress or strategic positioning, the tool works. Use it.