Model Description
This model is a fine-tuned dbmdz/bert-base-italian-xxl-cased for multi-label text classification on Eutekne domande_fitlri_materie train dataset.
Intended Use
This model is intended for classifying legal questions into 31 categories. These categories are based on the Eutekne domande_fitlri_materie.
Training Details
Base Model
This model is a finetuning of: dbmdz/bert-base-italian-xxl-cased
Training Data
{
"train_samples": 4044,
"val_samples": 867,
"test_samples": 867,
"num_labels": 31
}
Training Hyperparameters
{
"batch_size": 32,
"learning_rate": 2e-05,
"num_epochs": 20,
"max_length": 512,
"threshold": 0.5
}
Evaluation Results
| | validation_results | test_results |
|:--------------------------------|---------------------:|---------------:|
| eval_exact_match | 0.489043 | 0.472895 |
| eval_hamming_loss | 0.0273096 | 0.0298024 |
| eval_f1_micro | 0.697444 | 0.674258 |
| eval_f1_macro | 0.537159 | 0.57959 |
| eval_precision_micro | 0.725557 | 0.684558 |
| eval_precision_macro | 0.593484 | 0.626934 |
| eval_recall_micro | 0.671429 | 0.664263 |
| eval_recall_macro | 0.511322 | 0.574265 |
| eval_best_threshold | 0.6 | 0.6 |
| eval_avg_predictions_per_sample | 1.34487 | 1.39677 |
| eval_avg_labels_per_sample | 1.45329 | 1.43945 |
| | hit_rate | precision | recall | f1 | ndcg | coverage | mrr |
|:----|-----------:|------------:|---------:|-------:|-------:|-----------:|-------:|
| @1 | 0.7589 | 0.7589 | 0.2243 | 0.3217 | 0.7589 | 0.2243 | 0.2243 |
| @3 | 0.9043 | 0.3806 | 0.2963 | 0.3011 | 0.5013 | 0.2963 | 0.2571 |
| @5 | 0.9377 | 0.2496 | 0.3149 | 0.2514 | 0.4421 | 0.3149 | 0.2614 |
| @10 | 0.9746 | 0.1349 | 0.3334 | 0.176 | 0.4168 | 0.3334 | 0.2639 |
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from huggingface_hub import hf_hub_download
import pickle
import torch
import numpy as np
repo_id = "giacomorossojakala/dbmdz-bert-base-italian-xxl-cased-eutekne-filtri-materia-lv1"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
# Download and load label encoder
downloaded_path = hf_hub_download(repo_id=repo_id, filename="label_encoder.pkl")
with open(downloaded_path, 'rb') as f:
mlb = pickle.load(f)
custom_text = "agevolazioni acquisto prima casa"
inputs = tokenizer(custom_text, truncation=True, padding=True, max_length=512, return_tensors="pt")
model.eval()
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.sigmoid(logits)
predictions = (probabilities > 0.75).int().cpu().numpy()
predicted_labels = mlb.inverse_transform(predictions)
ranked_idexes = np.argsort(probabilities.cpu().numpy(), axis=1)[:, ::-1]
ranked_labels = np.array(mlb.classes_)[ranked_idexes]
print(f"Custom Text: agevolazioni acquisto prima casa")
print(f"Predicted Labels: ('V',)")
print(f"Ranked Labels: {'[' +', '.join(ranked_labels[0, :5]) + '...]'}")
- Downloads last month
- 38