---
license: gpl-3.0
tags:
- greek
- text-classification
- sequence-classification
- roberta
- fine-tuned
- climate-change
- mediawatch
- greek-news
pipeline_tag: text-classification
datasets:
- custom-greek-climate-news-dataset
model_name: mediawatch-el-climate
base_model: cvcio/roberta-el-news
widget:
- text: "Η **κλιματική κρίση** και η **ρύπανση** των ωκεανών απασχόλησε τη σύνοδο κορυφής."
  example_title: Example Text in Greek
---

# Greek Climate News Classification

The **`cvcio/mediawatch-el-climate`** model is a fine-tuned RoBERTa-based model for **Sequence Classification** of Greek news articles, specifically for topics related to **climate change** and the **environment**.

It is part of the MediaWatch-EL project and is designed to automatically categorize Greek news content, enabling large-scale analysis of media coverage on this critical subject.

---

## Model Overview

The model performs a **single-label text classification** task, assigning one of eight defined climate-related labels to a given text input.

* **Base Model:** **`cvcio/roberta-el-news`**
  * * This choice leverages a pre-trained language model optimized for the Greek language and news-related text.
* **Architecture:** **`RobertaForSequenceClassification`**
* **Language:** **Greek (el)**
* **Task:** Text Classification (Categorization of climate-related news).

### **Classification Labels**

The model classifies text into one of the following 8 labels, all of which represent distinct themes within climate and environmental reporting:

| Label | Greek Label | English Translation | Description |
| :--- | :--- | :--- | :--- |
| **LABEL_0** | **ΒΙΩΣΙΜΟΤΗΤΑ** | Sustainability | Topics related to sustainable practices and development. |
| **LABEL_1** | **ΠΕΡΙΒΑΛΛΟΝ** | Environment | Broad environmental topics, not solely climate. |
| **LABEL_2** | **ΚΛΙΜΑΤΙΚΗ ΑΛΛΑΓΗ** | Climate Change | General mention of climate change. |
| **LABEL_3** | **ΘΕΡΜΟΚΡΑΣΙΑ** | Temperature | Specific mentions of temperature or heat-related events. |
| **LABEL_4** | **ΚΛΙΜΑΤΙΚΗ ΚΡΙΣΗ** | Climate Crisis | Focus on the urgency or severity of the issue. |
| **LABEL_5** | **ΚΛΙΜΑ** | Climate | Meteorological or general climate context. |
| **LABEL_6** | **ΡΥΠΑΝΣΗ** | Pollution | Specific mentions of environmental contamination. |
| **LABEL_7** | **ΕΝΕΡΓΕΙΑ** | Energy | Focus on energy sources, transitions, or policy. |

---

## Training Data and Annotation

### **Dataset**

* **Size:** Approximately **12,000 unique Greek news articles**.
* **Source:** The articles were collected from a wide range of Greek online media outlets.
* **Content:** The full articles, not just titles, were used for fine-tuning.
* **Data Split:** The dataset was split using an **80% training** and **20% testing (evaluation)** ratio.

### **Annotation Process**

The dataset was annotated by a group of academics.

* **Methodology:** The news articles were labeled with one of the 8 categories mentioned above.
* **Known Limitation (Poor Annotation):** It is acknowledged that the annotation quality may not be optimal due to the inherent difficulty of the task, the complexity of journalistic text, and potential inter-annotator disagreement. This poor quality could introduce noise into the training data and potentially limit the model's maximum achievable performance.

---

## Training

The model was fine-tuned using a **custom Python script (`fine_tune_classifier.py`)** and the Hugging Face `transformers` library.

### **Key Hyperparameters**

| Hyperparameter | Value |
| :--- | :--- |
| **Base Model Checkpoint** | `cvcio/roberta-el-news` |
| **Number of Epochs** | 4 |
| **Batch Size (Train/Eval)** | 64 |
| **Weight Decay** | 0.01 |
| **Warmup Steps** | 50 |
| **Max Sequence Length** | 512  |

### **Evaluation Metrics**

* **Accuracy:** The overall fraction of correct predictions.
* **F1-Score (Weighted):** The F1 score, weighted by the number of true instances for each label. This is a critical metric for handling potential class imbalance in the dataset.

---

## Limitations

This model is intended for **research purposes** and for automated **media monitoring** in the Greek language. Specific uses include:

1. **Categorization:** Automatically classifying new Greek news articles into one of the 8 climate-related categories.
2. **Trend Analysis:** Monitoring the frequency and shifts in media coverage across the different climate topics over time.

### **Limitations and Biases**

* **Annotation Quality:** The primary limitation is the acknowledged "poor" quality of the academic annotations, which may lead to misclassifications, especially for ambiguous or intersectional articles.
* **Monolingual (Greek):** The model is strictly intended for Greek language text.
* **Domain Specificity:** It is fine-tuned only on news text. Its performance on other text types (e.g., social media, academic papers) will likely be lower.
* **General RoBERTa Biases:** The model may inherit any biases present in the original **`cvcio/roberta-el-news`** pre-training data.

---

## How to Use

You can easily use this model for sequence classification with the Hugging Face `transformers` library:

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

# Model ID on Hugging Face Hub
model_name = "cvcio/mediawatch-el-climate"

# Load model and tokenizer (using the base model's tokenizer)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained("cvcio/roberta-el-news")

# Create a classification pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Example Greek text
text = "Λειψυδρία: Σε ανησυχητικό επίπεδο η στάθμη του νερού σε Πηνειό και Μόρνο – Καμπανάκι ΕΥΔΑΠ για τα αποθέματα : Έχουμε λιγότερο από τα μισά του 2019"

# Run classification
result = classifier(text)

# Print the result
print(result)
# Expected output (e.g.): [{'label': 'ΚΛΙΜΑΤΙΚΗ ΑΛΛΑΓΗ', 'score': 0.98...}]
```