privacy-model / README.md
skythrone's picture
update readme.md
7b478b5 verified
metadata
license: mit
tags:
  - privacy
  - policy-analysis
  - classification
  - text-classification
  - transformers
  - distilbert
library_name: transformers
datasets:
  - opp-115
model-index:
  - name: Privacy Clause Classifier (DistilBERT - OPP-115)
    results: []

Privacy Clause Classifier (DistilBERT - OPP-115)

This model is a fine-tuned DistilBERT model designed to classify privacy policy clauses into one of the predefined privacy practices based on the OPP-115 dataset.

ID Category
0 Data Retention
1 Data Security
2 Do Not Track
3 First Party Collection/Use
4 International and Specific Audiences
5 Other
6 Policy Change
7 Third Party Sharing/Collection
8 User Access, Edit and Deletion
9 User Choice/Control

Model Details

  • Architecture: DistilBERT (pretrained)
  • Fine-tuning Dataset: OPP-115 Dataset
  • Input Format: Text snippets from privacy policies
  • Output Format: Predicted class label with probabilities

Intended Uses

  • Automatic privacy policy clause classification
  • Regulatory technology (RegTech) tools
  • Privacy policy summarization and simplification
  • Risk analysis for data sharing and collection practices

How to Use

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

# Load model
tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name")
model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name")

# Predict
text = "We may collect your location data to provide customized services."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()

print(f"Predicted Category: {predicted_class}")