|
--- |
|
library_name: tensorflow |
|
tags: |
|
- sentiment-analysis |
|
- aspect-based-sentiment-analysis |
|
- tensorflow |
|
- keras |
|
language: |
|
- tr |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
datasets: |
|
- Sengil/Turkish-ABSA-Wsynthetic |
|
--- |
|
|
|
|
|
# 🇹🇷 Turkish Aspect-Based Sentiment Analysis (ABSA) – BiLSTM + Word2Vec |
|
|
|
This model performs aspect-based sentiment analysis (ABSA) on Turkish sentences. Given a sentence and a specific aspect, it predicts the sentiment polarity (Negative, Neutral, Positive) associated with that aspect. |
|
|
|
## 🧠 Model Details |
|
|
|
- **Model Type:** BiLSTM (Bidirectional Long Short-Term Memory) + Word2Vec |
|
- **Developer:** [Sengil](https://huggingface.co/Sengil) |
|
- **Library:** Keras |
|
- **Input Format:** `"Sentence [ASP] Aspect"` |
|
- **Labels:** 0 = Negative, 1 = Neutral, 2 = Positive |
|
- **Training Dataset:** [Sengil/Turkish-ABSA-Wsynthetic](https://huggingface.co/datasets/Sengil/Turkish-ABSA-Wsynthetic) |
|
|
|
## 📊 Evaluation Results |
|
|
|
The model achieved the following performance on the test set: |
|
|
|
| Class | Precision | Recall | F1-Score | Support | |
|
|----------|-----------|--------|----------|---------| |
|
| Negative | 0.89 | 0.91 | 0.90 | 896 | |
|
| Neutral | 0.70 | 0.64 | 0.67 | 140 | |
|
| Positive | 0.92 | 0.92 | 0.92 | 1178 | |
|
| **Overall** | | | **0.90** | 2214 | |
|
|
|
- **Overall Accuracy:** 90% |
|
- **Macro-Averaged F1-Score:** 83% |
|
- **Weighted-Averaged F1-Score:** 90% |
|
|
|
## 🚀 Usage Example |
|
|
|
Download model from HF |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
import pickle |
|
from tensorflow.keras.models import load_model |
|
|
|
model_path = hf_hub_download(repo_id="Sengil/Turkish-ABSA-BiLSTM-Word2Vec", filename="absa_bilstm_model.keras") |
|
tokenizer_path = hf_hub_download(repo_id="Sengil/Turkish-ABSA-BiLSTM-Word2Vec", filename="tokenizer.pkl") |
|
|
|
# load model |
|
model = load_model(model_path) |
|
|
|
# load tokenizer |
|
with open(tokenizer_path, "rb") as f: |
|
tokenizer = pickle.load(f) |
|
```` |
|
|
|
Input preprocessing |
|
```python |
|
import re |
|
import nltk |
|
nltk.download('punkt') |
|
|
|
def preprocess_turkish(text): |
|
text = text.lower() |
|
text = re.sub(r"http\S+|www\S+|https\S+", "<url>", text) |
|
text = re.sub(r"@\w+", "<user>", text) |
|
text = re.sub(r"[^a-zA-Z0-9çğıöşüÇĞİÖŞÜ\s]", " ", text) |
|
text = re.sub(r"(.)\1{2,}", r"\1\1", text) |
|
text = re.sub(r"\s+", " ", text).strip() |
|
return text |
|
```` |
|
|
|
Predict the input |
|
```python |
|
import numpy as np |
|
from tensorflow.keras.preprocessing.sequence import pad_sequences |
|
|
|
def predict_sentiment(sentence, aspect, max_len=84): |
|
input_text = sentence + " [ASP] " + aspect |
|
cleaned = preprocess_turkish(input_text) |
|
tokenized = tokenizer.texts_to_sequences([cleaned]) |
|
padded = pad_sequences(tokenized, maxlen=max_len, padding='post') |
|
|
|
pred = model.predict(padded) |
|
label = np.argmax(pred) |
|
labels = {0: "Negatif", 1: "Nötr", 2: "Pozitif"} |
|
return labels[label] |
|
```` |
|
|
|
run |
|
```python |
|
sentence = "Manzara sahane evet ama servis rezalet." |
|
aspect = "manzara" |
|
|
|
predict = predict_sentiment(sentence, aspect) |
|
print("predict:", predict) |
|
```` |
|
|
|
## 🏋️♀️ Training Details |
|
|
|
* **Embedding:** Word2Vec (dimension: 100) |
|
* **Model Architecture:** |
|
|
|
* Embedding layer (initialized with pre-trained Word2Vec weights) |
|
* 2 x BiLSTM layers (each with 100 units, dropout: 0.3) |
|
* Conv1D layer (100 filters, kernel size: 5) |
|
* Global Max Pooling |
|
* Dense layer (16 units, ReLU activation) |
|
* Output layer (3 units, softmax activation) |
|
* **Training Parameters:** |
|
|
|
* Loss Function: `sparse_categorical_crossentropy` |
|
* Optimizer: Adam |
|
* Epochs: 35 (with early stopping) |
|
* Batch Size: 128 |
|
* Learning Rate: 1e-3 (adjusted dynamically with ReduceLROnPlateau) |
|
|
|
## 📚 Training Data |
|
|
|
The model was trained on the [Sengil/Turkish-ABSA-Wsynthetic](https://huggingface.co/datasets/Sengil/Turkish-ABSA-Wsynthetic) dataset, which comprises semi-synthetic Turkish sentences annotated for aspect-based sentiment analysis, particularly in the restaurant domain. |
|
|
|
## ⚠️ Limitations |
|
|
|
* Performance on the Neutral class is lower compared to other classes, possibly due to class imbalance in the training data. |
|
* The model may struggle with rare or ambiguous aspects not well represented in the training set. |
|
* Complex sentence structures or ironic expressions may affect the model's accuracy. |
|
|
|
## 📄 Citation |
|
|
|
``` |
|
@misc{turkish_absa_bilstm_word2vec, |
|
title = {Turkish Aspect-Based Sentiment Analysis using BiLSTM + Word2Vec}, |
|
author = {Sengil}, |
|
year = {2025}, |
|
url = {https://huggingface.co/Sengil/Turkish-ABSA-BiLSTM-Word2Vec} |
|
} |
|
``` |
|
|
|
## 📬 Contact |
|
|
|
For questions or feedback, please reach out via [Hugging Face profile](https://huggingface.co/Sengil). |
|
|
|
|