|
--- |
|
license: mit |
|
tags: |
|
- privacy |
|
- policy-analysis |
|
- classification |
|
- text-classification |
|
- transformers |
|
- distilbert |
|
library_name: transformers |
|
datasets: |
|
- opp-115 |
|
model-index: |
|
- name: Privacy Clause Classifier (DistilBERT - OPP-115) |
|
results: [] |
|
--- |
|
|
|
# Privacy Clause Classifier (DistilBERT - OPP-115) |
|
|
|
This model is a fine-tuned DistilBERT model designed to classify **privacy policy clauses** into one of the predefined privacy practices based on the [OPP-115 dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf). |
|
|
|
| ID | Category | |
|
|----|---------------------------------| |
|
| 0 | Data Retention | |
|
| 1 | Data Security | |
|
| 2 | Do Not Track | |
|
| 3 | First Party Collection/Use | |
|
| 4 | International and Specific Audiences | |
|
| 5 | Other | |
|
| 6 | Policy Change | |
|
| 7 | Third Party Sharing/Collection | |
|
| 8 | User Access, Edit and Deletion | |
|
| 9 | User Choice/Control | |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
- **Architecture**: DistilBERT (pretrained) |
|
- **Fine-tuning Dataset**: [OPP-115 Dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf) |
|
- **Input Format**: Text snippets from privacy policies |
|
- **Output Format**: Predicted class label with probabilities |
|
|
|
--- |
|
|
|
## Intended Uses |
|
|
|
- Automatic **privacy policy clause classification** |
|
- **Regulatory technology (RegTech)** tools |
|
- **Privacy policy summarization** and simplification |
|
- **Risk analysis** for data sharing and collection practices |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
```python |
|
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification |
|
import torch |
|
|
|
# Load model |
|
tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name") |
|
model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name") |
|
|
|
# Predict |
|
text = "We may collect your location data to provide customized services." |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
|
outputs = model(**inputs) |
|
predicted_class = torch.argmax(outputs.logits, dim=-1).item() |
|
|
|
print(f"Predicted Category: {predicted_class}") |
|
|