Model Description
This model is a fine-tuned dbmdz/bert-base-italian-xxl-cased for multi-label text classification on Eutekne domande_fitlri_materie train dataset.
Intended Use
This model is intended for classifying legal questions into 216 categories. These categories are based on the Eutekne domande_fitlri_materie.
Training Details
Base Model
This model is a finetuning of: dbmdz/bert-base-italian-xxl-cased
Training Data
{
"train_samples": 4044,
"val_samples": 867,
"test_samples": 867,
"num_labels": 216
}
Training Hyperparameters
{
"batch_size": 32,
"learning_rate": 2e-05,
"num_epochs": 5,
"max_length": 512,
"threshold": 0.5
}
Evaluation Results
| | validation_results | test_results |
|:---------------------|---------------------:|---------------:|
| eval_exact_match | 0.310265 | 0.303345 |
| eval_hamming_loss | 0.00932868 | 0.00976654 |
| eval_f1_micro | 0.589038 | 0.566896 |
| eval_f1_macro | 0.142712 | 0.157367 |
| eval_precision_micro | 0.773795 | 0.746259 |
| eval_precision_macro | 0.206659 | 0.234683 |
| eval_recall_micro | 0.475503 | 0.457045 |
| eval_recall_macro | 0.118758 | 0.13034 |
| | hit_rate | precision | recall | f1 | ndcg | coverage | mrr |
|:----|-----------:|------------:|---------:|-------:|-------:|-----------:|-------:|
| @1 | 0.7255 | 0.7255 | 0.2076 | 0.302 | 0.7255 | 0.2076 | 0.2076 |
| @3 | 0.8316 | 0.5398 | 0.4278 | 0.4389 | 0.6316 | 0.4278 | 0.3084 |
| @5 | 0.8674 | 0.3852 | 0.4841 | 0.3924 | 0.5862 | 0.4841 | 0.3214 |
| @10 | 0.9193 | 0.2261 | 0.546 | 0.2955 | 0.5756 | 0.546 | 0.3297 |
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from huggingface_hub import hf_hub_download
import pickle
import torch
import numpy as np
repo_id = "giacomorossojakala/dbmdz-bert-base-italian-xxl-cased-eutekne-filtri-materia-lv2"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
# Download and load label encoder
downloaded_path = hf_hub_download(repo_id=repo_id, filename="label_encoder.pkl")
with open(downloaded_path, 'rb') as f:
mlb = pickle.load(f)
custom_text = "agevolazioni acquisto prima casa"
inputs = tokenizer(custom_text, truncation=True, padding=True, max_length=512, return_tensors="pt")
model.eval()
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.sigmoid(logits)
predictions = (probabilities > 0.75).int().cpu().numpy()
predicted_labels = mlb.inverse_transform(predictions)
ranked_idexes = np.argsort(probabilities.cpu().numpy(), axis=1)[:, ::-1]
ranked_labels = np.array(mlb.classes_)[ranked_idexes]
print(f"Custom Text: agevolazioni acquisto prima casa")
print(f"Predicted Labels: ('V',)")
print(f"Ranked Labels: {'[' +', '.join(ranked_labels[0, :5]) + '...]'}")
- Downloads last month
- 22