privacy-model / README.md
skythrone's picture
update readme.md
7b478b5 verified
---
license: mit
tags:
- privacy
- policy-analysis
- classification
- text-classification
- transformers
- distilbert
library_name: transformers
datasets:
- opp-115
model-index:
- name: Privacy Clause Classifier (DistilBERT - OPP-115)
results: []
---
# Privacy Clause Classifier (DistilBERT - OPP-115)
This model is a fine-tuned DistilBERT model designed to classify **privacy policy clauses** into one of the predefined privacy practices based on the [OPP-115 dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf).
| ID | Category |
|----|---------------------------------|
| 0 | Data Retention |
| 1 | Data Security |
| 2 | Do Not Track |
| 3 | First Party Collection/Use |
| 4 | International and Specific Audiences |
| 5 | Other |
| 6 | Policy Change |
| 7 | Third Party Sharing/Collection |
| 8 | User Access, Edit and Deletion |
| 9 | User Choice/Control |
---
## Model Details
- **Architecture**: DistilBERT (pretrained)
- **Fine-tuning Dataset**: [OPP-115 Dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf)
- **Input Format**: Text snippets from privacy policies
- **Output Format**: Predicted class label with probabilities
---
## Intended Uses
- Automatic **privacy policy clause classification**
- **Regulatory technology (RegTech)** tools
- **Privacy policy summarization** and simplification
- **Risk analysis** for data sharing and collection practices
---
## How to Use
```python
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch
# Load model
tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name")
model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name")
# Predict
text = "We may collect your location data to provide customized services."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
print(f"Predicted Category: {predicted_class}")