AfroLogicInsect's picture
Update README.md
46db3f3 verified
---
library_name: transformers
tags:
- topic
- multi-sentiment
license: mit
datasets:
- valurank/Topic_Classification
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model:
- distilbert/distilbert-base-uncased
---
# Model Card for Topic Classification Model
A fine-tuned DistilBERT model for multi-class topic classification. This model predicts the most relevant topic label from a predefined set based on input text. It was trained using 🤗 Transformers and PyTorch on a custom dataset derived from academic and news-style corpora.
## Model Details
### Model Description
This model was developed by Daniel (@AfroLogicInsect) to classify text into one of several predefined topics. It builds on the `distilbert-base-uncased` architecture and was fine-tuned for multi-class classification using a softmax output layer.
- **Developed by:** Daniel 🇳🇬 (@AfroLogicInsect)
- **Model type:** DistilBERT-based multi-class sequence classifier
- **Language(s):** English
- **License:** MIT
- **Finetuned from:** distilbert-base-uncased
### Model Sources
- **Repository:** [AfroLogicInsect/topic-model-analysis-model](https://huggingface.co/AfroLogicInsect/topic-model-analysis-model)
- **Paper:** arXiv:1910.09700 (DistilBERT)
- **Demo:** [Coming soon]
## Uses
### Direct Use
- Classify academic or news-style text into topics such as AI, finance, sports, climate, etc.
- Embed in dashboards or content moderation tools for automatic tagging
### Downstream Use
- Can be extended to hierarchical topic classification
- Useful for building recommendation engines or content filters
### Out-of-Scope Use
- Not suitable for sentiment or emotion classification
- May not generalize well to informal or slang-heavy text
## Bias, Risks, and Limitations
- Trained on curated corpora — may reflect biases in source material
- Topics are predefined and static — emerging topics may be misclassified
- Confidence scores are probabilistic, not definitive
### Recommendations
- Use `top_k=5` with `return_all_scores=True` to retrieve multiple topic predictions
- Consider fine-tuning on domain-specific data for improved accuracy
## How to Get Started
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="AfroLogicInsect/topic-model-analysis-model",
tokenizer="AfroLogicInsect/topic-model-analysis-model",
return_all_scores=True
)
text = "New AI breakthrough in natural language processing"
results = classifier(text)
top_5 = sorted(results[0], key=lambda x: x['score'], reverse=True)[:5]
for i, res in enumerate(top_5):
print(f"Top {i+1}: {res['label']} ({res['score']:.3f})")
```
## Training Details
### Dataset
- Custom multi-class topic dataset based on arXiv abstracts and news articles
- Labels include domains like AI, finance, sports, climate, etc.
### Hyperparameters
- Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Evaluation every 200 steps
- Metric: F1 score
### Trainer Setup
Used Hugging Face `Trainer` API with `TrainingArguments` configured for early stopping and best model selection.
## Evaluation
Model achieved strong performance across multiple topic categories. Evaluation metrics include:
- **Accuracy:** ~90.8%
- **F1 Score:** ~0.91
- **Precision:** ~0.89
- **Recall:** ~0.93
## Environmental Impact
- **Hardware:** Google Colab (NVIDIA T4 GPU)
- **Training Time:** ~2.5 hours
- **Carbon Emitted:** ~0.3 kg CO₂eq (estimated via [ML Impact Calculator](https://mlco2.github.io/impact#compute))
## Citation
```bibtex
@misc{afrologicinsect2025topicmodel,
title = {AfroLogicInsect Topic Classification Model},
author = {Akan Daniel},
year = {2025},
howpublished = {\url{https://huggingface.co/AfroLogicInsect/topic-model-analysis-model}},
}
```
## Contact
- Name: Daniel (@AfroLogicInsect)
- Location: Lagos, Nigeria
- Contact: GitHub / Hugging Face / email (danielamahtoday@gmail.com)