How Do You Use AI to Find PII Without Showing AI the PII?

When we started building AI Data Maturity, we ran into a contradiction we couldn’t ignore.

The tool’s job is to assess whether your data is ready for AI — including whether it contains personally identifiable information that should never reach an AI tool. PII detection is one of the most valuable things we do. It’s also the thing that created our hardest design problem.

To find PII, you need to look at the data. But to protect PII, you can’t send it to AI. So how do you use AI to assess PII risk without exposing PII to AI?

That’s not a rhetorical question. It’s the question we had to answer before we could launch.

The easy wrong answer

The easy answer is to let AI handle everything. Upload the file, send it to the model, ask it to flag PII. It works. It’s also exactly what we were trying to prevent. Sending a file containing Social Security numbers, email addresses, and patient names to an external AI model — even to ask whether those things are present — is the problem, not the solution.

The contradiction

We needed AI to assess the data. We needed PII to never reach AI. Both requirements were non-negotiable.

In TRIZ — a framework for solving engineering contradictions — this is called a physical contradiction. The same data needs to be both visible and protected. The resolution isn’t to compromise between the two. It’s to separate them in time or space.

That’s exactly what we did.

The resolution

Before any data reaches AI, it passes through a masking layer that runs entirely on our servers. This layer scans every value in every text column and replaces sensitive values with typed placeholder tokens. What AI receives isn’t your data. It’s a structural representation of your data — column names, data types, null rates, value patterns — with sensitive values already removed.

AI never sees the original values. The masking happens first, locally, before anything leaves the processing pipeline.

The result: AI can assess your data for quality, ambiguity, and PII risk at the column level — because column names and data patterns reveal risk even without the underlying values — without ever seeing the values that needed protecting.

What this means for you

When you upload a file to AI Data Maturity, the first thing that happens isn’t AI analysis. It’s a local scan for sensitive values. The processing screen tells you this explicitly: “Scanning and masking sensitive data before sending to AI.”

That sequence is intentional. It’s not a feature we added. It’s the design principle we started with.

Your data never reaches AI unmasked. That’s not a promise. It’s an architectural constraint we built in from the start.

The infrastructure behind the promise

This isn’t just a design decision — it’s backed by the infrastructure we chose. AI Data Maturity runs on Google Cloud Platform, encrypted in transit, with no cookies and no file storage. Your file is read in memory to extract column statistics and discarded immediately after your assessment completes. Nothing is written to disk. Nothing is retained.

The masking happens before AI. The data disappears after. That’s the full picture.