jaihodigital's picture
Update README.md
0aaf28c verified
---
license: mit
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
pipeline_tag: text-classification
tags:
- NLP
- SentimentAnalysis
- LogisticRegression
- ScikitLearn
---
# 🧠 Sentiment Analysis with Logistic Regression
This model performs **multi-class sentiment analysis** on tweets, classifying them into the following categories:
- Positive
- Negative
- Neutral
- Irrelevant
It uses a custom preprocessing pipeline with:
<!-- - Text cleaning (URL, mention, hashtag, punctuation removal)-->
- CountVectorizer
- TF-IDF transformation
- Logistic Regression classifier (`max_iter=1000`)
---
## πŸ— Model Architecture
<!-- - **TextCleaner**: Custom scikit-learn transformer for consistent text preprocessing.-->
- **CountVectorizer**: Converts tweets into token count vectors.
- **TfidfTransformer**: Reweights tokens by importance.
- **LogisticRegression**: Interpretable and robust classification baseline.
---
## πŸ§ͺ Evaluation
Evaluated on a separate validation set of 999 tweets:
| Class | Precision | Recall | F1-score |
|-------------|-----------|--------|----------|
| Irrelevant | 0.88 | 0.85 | 0.87 |
| Negative | 0.87 | 0.94 | 0.91 |
| Neutral | 0.97 | 0.86 | 0.91 |
| Positive | 0.89 | 0.94 | 0.91 |
| **Overall Accuracy** | | | **0.90** |
---
## πŸ“¦ Usage
```
python
import joblib
model = joblib.load("sentiment_model_lr.pkl")
user_input = "This update is surprisingly good!"
prediction = model.predict([user_input])
print(prediction[0]) # β†’ Positive, Negative, etc.
```
---
```> ⚠️ Requires scikit-learn 1.6.1+ to avoid version mismatch warnings.```
---
## πŸ“š Dataset
```
Tweets were preprocessed using a clean_text routine and labeled into
the four sentiment categories. If you’d like to experiment or re-train, contact
the author or fork this repo.
```
---
## πŸ§‘β€πŸ’» Author
```
Built by @arshvir Model version: 1.0 License: MIT
```
---