Model Card for bert-sentiment-gr

This is a fine-tuned BERT model for sentiment analysis in Greek, trained to classify short texts or reviews as "positive" or "negative".

Model Details

Model Description

This model is a Greek BERT (nlpaueb/bert-base-greek-uncased-v1) fine-tuned on a dataset of reviews for sentiment classification. It predicts whether a given text expresses positive or negative sentiment.

  • Developed by: George Zografos
  • Model type: BERT (Transformer-based)
  • Language(s) (NLP): Greek
  • License: CC BY-SA 4.0
  • Finetuned from model [optional]: nlpaueb/bert-base-greek-uncased-v1

Model Sources [optional]

Uses

Direct Use

This model can be used for Greek text sentiment classification in reviews, comments, or social media content. It is suitable for tasks such as automated feedback analysis or social sentiment monitoring.

Downstream Use [optional]

The model can be used as a feature extractor or as part of a pipeline for downstream tasks such as multi-class sentiment classification, recommendation systems, or opinion mining.

Out-of-Scope Use

  • Texts outside modern Greek or heavily domain-specific jargon may result in lower performance.
  • The model is not suitable for nuanced sentiment tasks requiring sarcasm or context beyond the sentence level.

Bias, Risks, and Limitations

The model is trained on a limited dataset (~6.5K saples) of Greek reviews on Skroutz Reviews Dataset.

About the Dataset: This is a dataset of shop reviews in Greek language from the website Skroutz for sentiment analysis. There are 6552 reviews divided into two categories 3276 positive and 3276 negative. There are three columns (id, review, label). Dataset

Biases present in the dataset (e.g., overrepresentation of certain products, domains, or writing styles) may affect predictions. Users should be aware that the model may not generalize to all Greek text domains or dialects.

Recommendations

Use caution in critical applications such as legal, medical, or high-stakes decision-making. Evaluate model outputs against domain-specific data before deployment.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "GZogra/bert-ft-sentiment-skroutz-gr"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Example inference

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
    outputs = model(**inputs)
    pred = torch.argmax(outputs.logits, dim=1).item()

labels = {0: "negative", 1: "positive"}

text = "Το προϊόν ήταν εξαιρετικό!"
print(labels[pred])

Training Details

Training Data

  • Source: Skroutz_dataset.xlsx with columns id, text, label (positive/negative)
  • Preprocessing: Removal of empty texts, normalization of labels, mapping to 0/1

Training Procedure

  • Tokenizer: nlpaueb/bert-base-greek-uncased-v1 tokenizer
  • Max sequence length: 128
  • Framework: Hugging Face Transformers Trainer
  • Device: GPU if available, otherwise CPU

Training Hyperparameters

  • Training regime: fp32 precision, standard fine-tuning
  • Learning rate: 2e-5
  • Batch size: 16 per device (train & eval)
  • Epochs: 3
  • Weight decay: 0.01
  • Evaluation & Save strategy: per epoch
  • Logging steps: 50
  • Metrics: accuracy, F1-score, precision, recall

Evaluation

Testing Data, Factors & Metrics

  • Test set: 10% of dataset, stratified sampling
  • Metrics computed: accuracy, F1-score, precision, recall

Technical Specifications [optional]

Model Architecture and Objective

  • Architecture: BERT-based Transformer
  • Objective: Binary classification (positive/negative sentiment)
Downloads last month
19
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support