Skip to main content

Message sanitization

Message sanitization detects and acts on sensitive information in user messages before they are sent to an AI model. It helps protect your organization from accidentally sharing PII, credentials, and other confidential data in chat.

Organization admins configure the feature per organization: which entity types to detect, confidence thresholds, actions (allow, redact, or block), and an allow list of safe terms.

note

Message sanitization is not enabled by default. To enable it for your organization, contact the Devs.ai team.

How it works

When you send a message in a chat, the system scans it for sensitive information. Depending on what is detected and how your organization's policy is set, one of three outcomes applies:

  1. Allow — No sensitive data is detected (or everything is below thresholds), so the message passes through unchanged.
  2. Redact — Sensitive data is detected and the policy requires redaction. The message is sent to the model with sensitive portions replaced by [REDACTED].
  3. Block — High-risk data is detected (for example, private keys or API keys). The message is rejected. You see an alert listing which entities were found and can choose Use redacted version to resubmit with sensitive content removed.

For messages already in a thread, sanitization runs before each model call. Previously unsanitized content is redacted automatically.

Admin configuration

Organization admins manage the policy under Settings → Message Sanitization.

Available controls

  • Enable or disable the feature with a master toggle
  • Apply presets for quick configuration:
    • Strict — Blocks government IDs and secrets; redacts everything else. Low detection thresholds.
    • Balanced — Redacts most PII; blocks only high-risk secrets. Medium thresholds.
    • Internal-Friendly — Higher thresholds; blocks only secrets; redacts PII only at high confidence.
  • Set a default action (Allow, Redact, or Block) for entities without specific rules
  • Configure per-entity rules with custom confidence thresholds and actions
  • Maintain an allow list of terms that bypass detection (for example, company names that might be flagged as person names)
  • Browse the entity catalog of 50+ supported entity types, grouped by region
  • Test the policy in the built-in test console before saving

Testing your policy

You can test the policy without affecting real conversations:

  1. Go to Settings → Message Sanitization.
  2. Configure the policy you want.
  3. In the Test Console, enter sample text (for example, "My SSN is 123-45-6789 and email is john@example.com").
  4. Click Run Test to see detected entities, confidence scores, and the resulting action (allow, redact, or block).

Supported entity types

The system detects 50+ entity types across these categories:

RegionEntity types
GlobalEmail, Phone, Person Name, Organization, Location, Date/Time, URL, Credit Card, Crypto Wallet, IBAN, IP Address, MAC Address, Medical License
USASSN, Bank Number, Driver License, ITIN, Passport, Medicare ID, NPI
UKNHS Number, NINO
EuropeSpanish NIF/NIE, Italian Fiscal Code/Driver License/VAT/Passport/ID, Polish PESEL, Finnish Identity Code
Asia-PacificSingapore NRIC/UEN, Australian ABN/ACN/TFN/Medicare, Indian PAN/Aadhaar/Passport/GSTIN, Korean identifiers, Thai TNIN
SecretsAPI Keys, Access Tokens, Passwords, Private Keys

What users see

Redacted messages

When a message is redacted, sensitive segments are replaced with [REDACTED] before the model sees them. The conversation continues; you may see placeholders if you read the message back.

Blocked messages

When a message is blocked, you see a red alert with:

  • A "Message Blocked" heading
  • Badges for the entity types detected (for example, PRIVATE_KEY)
  • A Use redacted version button that resubmits your message with sensitive content removed

Limitations

  • Detection works best on English text. Other languages may have reduced entity coverage.
  • The allow list uses case-insensitive substring matching — an entry like john would also match inside johnson.
  • Only text in messages is evaluated; images are not scanned.
  • If the detection service is temporarily unavailable, the system falls back to basic pattern-based detection (private keys, API keys, tokens, and passwords).
⌘J