OpenAI Open-Sources Privacy Filter — A 1.5B Model That Scrubs Your Secrets Before ChatGPT Sees Them

OpenAI has released Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. It’s 1.5 billion parameters (50 million active), licensed under Apache 2.0, and it runs on device. Your data never leaves your machine.

🔍 THE BOTTOM LINE: The company that built its empire on processing your data just gave away a tool that stops you from sending it. That’s either genuine progress or the most strategic open-source release of the year.

🔎 What It Does

Privacy Filter is a bidirectional token classifier. It reads your text, identifies every piece of personally identifiable information, and masks it before the data goes anywhere.

What it detects:

Names, addresses, phone numbers
Email addresses and usernames
Social security and ID numbers
Credit card and bank details
Medical records and health data
IP addresses and device identifiers
Date of birth, age, and biometric data

The full taxonomy covers V4 and V7 PII categories — the international standards used by GDPR and privacy regulators worldwide.

Technical specs:

1.5B parameters (50M active via sparse architecture)
128,000-token context window
Bidirectional token classification
Runs locally — no API call needed
Apache 2.0 license — free for commercial use

🛡️ Why This Matters

Every day, millions of people paste things into ChatGPT they shouldn’t. Tax returns. Medical records. Client emails. Legal documents. Most of that data gets processed in the cloud, stored in logs, and potentially used for training.

Privacy Filter sits between you and the cloud. It sanitises the data first. Then you send the sanitised version.

For organisations, this solves a real compliance problem:

GDPR — You can demonstrate data minimisation before processing
NZ Privacy Act 2020 — You’re not sending personal information to third parties without cause
HIPAA — Health data can be stripped before any AI touchpoint
Industry-specific regulations — Financial, legal, and government data can be pre-screened

🔧 How to Use It

The model is available on Hugging Face as openai/privacy-filter:

from transformers import AutoModelForTokenClassification, AutoTokenizer

model = AutoModelForTokenClassification.from_pretrained("openai/privacy-filter")
tokenizer = AutoTokenizer.from_pretrained("openai/privacy-filter")

It integrates into any data pipeline:

Pre-processing for AI tools — Sanitise prompts before sending to any LLM
Data pipeline cleaning — Strip PII from training datasets
Compliance workflows — Automated PII detection for regulatory audits
ChatGPT safety layer — Use locally before any cloud API call

🇳🇿 NZ Relevance

New Zealand’s Privacy Act 2020 Principle 1 requires that personal information is collected only for a lawful purpose. Principle 5 requires that you tell people what you’re collecting and why.

If your organisation uses ChatGPT, Claude, or any cloud AI tool, you may be sending personal information offshore without adequate disclosure. Privacy Filter gives you a local sanitisation layer — the AI never sees the raw data, only the masked version.

For NZ businesses handling customer data, this is a practical compliance tool, not just a nice-to-have.

⚠️ The Irony

The company that processes more personal data through AI than almost anyone else just released a tool to help you keep your data private from AI companies. Including them.

OpenAI’s own blog frames this as supporting “safe and responsible AI deployment.” That’s true. It’s also true that releasing a free PII scrubber makes it harder for regulators to argue that AI companies can’t handle privacy — because now there’s an open-source solution anyone can deploy.

Whether this is genuine progress or strategic positioning, the tool works. Use it.

Singularity.Kiwi

OpenAI Open-Sources Privacy Filter — A 1.5B Model That Scrubs Your Secrets Before ChatGPT Sees Them

🔎 What It Does

🛡️ Why This Matters

🔧 How to Use It

🇳🇿 NZ Relevance

⚠️ The Irony

📚 Sources

OpenAI Open-Sources Privacy Filter — A 1.5B Model That Scrubs Your Secrets Before ChatGPT Sees Them

🔎 What It Does

🛡️ Why This Matters

🔧 How to Use It

🇳🇿 NZ Relevance

⚠️ The Irony

📚 Sources

Related Articles

ChatGPT Is Now Serving Ads — Here's How the Tracking Actually Works

Your AI Is Watching You: Why Local Inference Is the Only Safe Option

Anthropic Just Overtook OpenAI in Business Customers — and the Data Says It's Not a Fluke