Detecting and Masking Personal Information

Handling user data responsibly is a foundation of building trustworthy applications. When working with Large Language Models, which are designed to process and generate natural language, the risk of inadvertently logging, processing, or exposing personally identifiable information (PII) is significant. Failing to protect this information can lead to privacy violations, loss of user trust, and legal consequences under regulations like GDPR and CCPA.

Your applications should have a safety layer to detect and handle PII before it is sent to an LLM or stored in logs. The toolkit provides a set of functions within the safety module designed for this purpose.

Detecting Personally Identifiable Information

The first step in protecting sensitive data is identifying it. The detect_pii function scans text for various types of personal information, such as email addresses, phone numbers, credit card numbers, and more.

Imagine a user provides their contact information in a support chat. Before processing this message with an LLM, you should scan it for PII.

from kerb.safety import detect_pii

user_message = "My email is [email protected] and my phone is 555-123-4567."
matches = detect_pii(user_message)

print(f"Found {len(matches)} PII instances:")
for match in matches:
    print(f"  - Type: {match.pii_type.value}")
    print(f"    Text: {match.text}")
    print(f"    Confidence: {match.confidence:.2f}")

This will produce the following output:

Found 2 PII instances:
  - Type: email
    Text: [email protected]
    Confidence: 0.95
  - Type: phone
    Text: 555-123-4567
    Confidence: 0.90

The function returns a list of match objects, each containing the detected text, its type, and a confidence score. This structured output allows you to build logic based on what was found. For example, you might flag messages with high-confidence PII matches for special handling or automatic redaction.

In some cases, you may only be concerned with specific types of PII. The detect_pii function can be configured to look for a targeted set of information. You can also use specialized functions like detect_email or detect_phone for more direct checks.

from kerb.safety import detect_pii, detect_credit_card, PIIType

text = "My email is [email protected], but please charge my card 4532 1234 5678 9010."

# Detect only emails and phones
specific_matches = detect_pii(text, pii_types=[PIIType.EMAIL, PIIType.PHONE])
print(f"Found {len(specific_matches)} email/phone instances.")

# Use a specialized detector for credit cards
card_matches = detect_credit_card(text)
if card_matches:
    print(f"Credit card detected: {card_matches[0].text}")

This granular control is useful for creating context-aware safety rules, such as being stricter about financial information than about email addresses in certain contexts.

Redaction and Anonymization

Once PII is detected, you have two primary strategies for handling it: redaction and anonymization.

Redaction involves replacing the sensitive information with a fixed placeholder, typically [REDACTED] or ***. This is useful for sanitizing logs where the PII is not needed for analysis.
Anonymization replaces the PII with a descriptive placeholder, like [EMAIL] or [PHONE]. This approach preserves the context that a piece of information was present without exposing the data itself. It is ideal for preparing user input for an LLM, as the model can understand that a user provided an email address without seeing the actual address.

The redact_pii function removes sensitive data completely, making it suitable for secure logging.

from kerb.safety import redact_pii

sensitive_text = "Please contact [email protected] or call (555) 987-6543 for details."
redacted_text = redact_pii(sensitive_text)

print(f"Original: {sensitive_text}")
print(f"Redacted: {redacted_text}")

The output shows the PII replaced with a standard placeholder:

Original: Please contact [email protected] or call (555) 987-6543 for details.
Redacted: Please contact [REDACTED] or call [REDACTED] for details.

For preparing data for LLM processing, anonymize_text is often more useful because it preserves the type of information that was removed.

from kerb.safety import anonymize_text

customer_query = "My account email is [email protected] and my credit card is 4532-1234-5678-9010."
anonymized_query = anonymize_text(customer_query)

print(f"Original: {customer_query}")
print(f"Anonymized: {anonymized_query}")

The anonymized output maintains the structural context of the original message:

Original: My account email is [email protected] and my credit card is 4532-1234-5678-9010.
Anonymized: My account email is [EMAIL] and my credit card is [CREDIT_CARD].

This anonymized text can be safely sent to an LLM. The model can still generate a helpful response, such as "I see you're asking about the account associated with [EMAIL] and the payment method [CREDIT_CARD]. How can I assist you?", without ever processing the actual sensitive data.

A Practical Workflow for LLM Conversations

A safety layer should be applied to both user inputs and model outputs. This prevents sensitive user data from being sent to the model provider and ensures the model does not inadvertently leak any PII in its responses.

Here is a practical workflow for handling PII in a conversational application:

Scan User Input: When a user sends a message, use detect_pii to check for sensitive information.
Anonymize for Processing: If PII is found, use anonymize_text to create a safe version of the input to send to the LLM.
Generate Response: The LLM generates a response based on the anonymized input.
Scan Model Output: Use detect_pii on the LLM's response to ensure it hasn't generated or repeated any PII.
Log Securely: Before storing the conversation turn in your logs, apply redact_pii to both the original user input and the model's response to ensure no sensitive data is stored long-term.

from kerb.safety import detect_pii, anonymize_text, redact_pii

# User input with PII
user_input = "I'm having login issues with my account, [email protected]."

# 1. Scan user input
pii_matches = detect_pii(user_input)
if pii_matches:
    print("PII detected in user input. Anonymizing for LLM.")

    # 2. Anonymize for processing
    anonymized_input = anonymize_text(user_input)
    print(f"Anonymized input for LLM: {anonymized_input}")
    
    # 3. Simulate LLM response (this would be a real LLM call)
    # The LLM might accidentally repeat the placeholder
    llm_response = f"I can help with the account associated with {anonymized_input.split()[-1]}"
    print(f"LLM Response: {llm_response}")

    # 4. Scan model output
    output_pii = detect_pii(llm_response)
    if output_pii:
        print("Warning: PII placeholder found in LLM output. Redacting for safety.")

    # 5. Log securely
    safe_log_input = redact_pii(user_input)
    safe_log_output = redact_pii(llm_response)
    print(f"Secure log entry (input): {safe_log_input}")
    print(f"Secure log entry (output): {safe_log_output}")

By integrating these steps into your application, you create a safety boundary that protects user privacy at every stage of the LLM interaction. This not only fulfills legal and ethical obligations but also builds the user trust that is essential for any successful AI application.

Was this section helpful?

References

Privacy-preserving natural language processing: A survey, Fenglong Xu, Yong Zhang, Minghao Zhang, Mingchen Li, Xinyu Yang, Jin Li, 2021 Artificial Intelligence Review, Vol. 54 (Springer) DOI: 10.1007/s10462-020-09886-w - Surveys techniques for protecting privacy in natural language processing, including methods for anonymization and redaction of textual data.