|
--- |
|
title: PII Protection Tool |
|
emoji: 🛡️ |
|
colorFrom: pink |
|
colorTo: blue |
|
sdk: gradio |
|
sdk_version: 5.25.0 |
|
app_file: app.py |
|
pinned: false |
|
license: apache-2.0 |
|
--- |
|
|
|
# PII De-identification with Custom Models |
|
|
|
This Hugging Face Space provides a tool for detecting and de-identifying personally identifiable information (PII) in text documents. The app uses a combination of regex patterns and NER models to identify sensitive information and offers three protection methods: |
|
|
|
1. **Replace**: Replaces PII with entity type tags (e.g., `<NAME>`) |
|
2. **Mask**: Masks PII with asterisks |
|
3. **Synthesize**: Replaces PII with realistic synthetic data |
|
|
|
## Features |
|
|
|
- Detects general PII like names, emails, phone numbers, credit cards |
|
- Specialized detection for Indian identifiers (PAN, Aadhar) |
|
- Optional medical entity recognition |
|
- Multiple de-identification methods |
|
- Detailed findings report |
|
|
|
## How to Use |
|
|
|
1. Enter or paste text containing PII in the input box |
|
2. Select a model type (general or medical) |
|
3. Choose a de-identification approach |
|
4. Click "Process Text" to analyze and protect the content |
|
|
|
## Models |
|
|
|
- **Main model**: Kashish-jain/pii-protection-model |
|
- **Medical model**: Medical entity recognition (when selected) |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |