Presidio Anonymizer Plugin
Overview
This plugin integrates with Microsoft’s Presidio to analyze and anonymize sensitive data in your fields. The plugin uses two Presidio services:
- Analyzer API for detecting PII entities
- Anonymizer API for anonymizing detected entities
This plugin integrates with Microsoft’s Presidio Anonymizer API to anonymize sensitive data in your fields. The Presidio anonymizer is module for anonymizing detected PII text entities with desired values.
Configuration
presidio_anonymizer: anonymize_url: http://localhost:8080/anonymize analyzer_url: http://localhost:8080/analyze language: en hash_type: md5 # Optional, used for hash operator encrypt_key: "" # Optional, used for encrypt operator anonymizer_rules: - type: EMAIL_ADDRESS operator: mask masking_char: "*" chars_to_mask: 4 - type: PERSON operator: replace new_value: "[REDACTED]" - type: PHONE_NUMBER operator: hash - type: CREDIT_CARD operator: encrypt
Configuration Parameters
anonymize_url
: Required. The URL of your Presidio Anonymizer API endpoint.analyzer_url
: Required. The URL of your Presidio Analyzer API endpoint.language
: Optional. Language for the analyzer (default: “en”).hash_type
: Optional. Hash algorithm for “hash” operator (e.g., “md5”, “sha256”).encrypt_key
: Optional. Encryption key for “encrypt” operator.anonymizer_rules
: List of anonymization rules that will be applied to detected entities.
Each rule contains:
type
: The type of PII to detect (e.g., “PERSON”, “EMAIL_ADDRESS”, “PHONE_NUMBER”, etc.)operator
: The anonymization operation. Supported values:mask
: Mask the value with a characterreplace
: Replace with a new valuehash
: Hash the value using specified algorithmencrypt
: Encrypt the value using provided key
masking_char
: Used with “mask” operator - the character to use for maskingchars_to_mask
: Used with “mask” operator - number of characters to masknew_value
: Used with “replace” operator - the value to replace the detected PII with
Example
Input:
{ "email": "john.doe@example.com", "name": "John Doe", "phone": "+1-555-123-4567", "description": "Contact John Doe at john.doe@example.com or +1-555-123-4567"}
Output:
{ "email": "****.doe@example.com", "name": "John Doe", "phone": "+1-555-123-4567", "description": "Contact <PERSON> at ****.doe@example.com or +<IN_PAN>4567"}
Notes
- The plugin first uses Presidio Analyzer to detect PII entities in the text
- Then it applies the configured anonymization rules to the detected entities
- If no PII is detected, the original data is returned unchanged
- Each anonymization operator requires specific parameters:
mask
: requiresmasking_char
andchars_to_mask
replace
: requiresnew_value
hash
: uses globalhash_type
configurationencrypt
: uses globalencrypt_key
configuration
- The anonymization is applied to all detected entities of the specified type in the text