Model Card for Topic Classification Model
A fine-tuned DistilBERT model for multi-class topic classification. This model predicts the most relevant topic label from a predefined set based on input text. It was trained using 🤗 Transformers and PyTorch on a custom dataset derived from academic and news-style corpora.
Model Details
Model Description
This model was developed by Daniel (@AfroLogicInsect) to classify text into one of several predefined topics. It builds on the distilbert-base-uncased
architecture and was fine-tuned for multi-class classification using a softmax output layer.
- Developed by: Daniel 🇳🇬 (@AfroLogicInsect)
- Model type: DistilBERT-based multi-class sequence classifier
- Language(s): English
- License: MIT
- Finetuned from: distilbert-base-uncased
Model Sources
- Repository: AfroLogicInsect/topic-model-analysis-model
- Paper: arXiv:1910.09700 (DistilBERT)
- Demo: [Coming soon]
Uses
Direct Use
- Classify academic or news-style text into topics such as AI, finance, sports, climate, etc.
- Embed in dashboards or content moderation tools for automatic tagging
Downstream Use
- Can be extended to hierarchical topic classification
- Useful for building recommendation engines or content filters
Out-of-Scope Use
- Not suitable for sentiment or emotion classification
- May not generalize well to informal or slang-heavy text
Bias, Risks, and Limitations
- Trained on curated corpora — may reflect biases in source material
- Topics are predefined and static — emerging topics may be misclassified
- Confidence scores are probabilistic, not definitive
Recommendations
- Use
top_k=5
withreturn_all_scores=True
to retrieve multiple topic predictions - Consider fine-tuning on domain-specific data for improved accuracy
How to Get Started
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="AfroLogicInsect/topic-model-analysis-model",
tokenizer="AfroLogicInsect/topic-model-analysis-model",
return_all_scores=True
)
text = "New AI breakthrough in natural language processing"
results = classifier(text)
top_5 = sorted(results[0], key=lambda x: x['score'], reverse=True)[:5]
for i, res in enumerate(top_5):
print(f"Top {i+1}: {res['label']} ({res['score']:.3f})")
Training Details
Dataset
- Custom multi-class topic dataset based on arXiv abstracts and news articles
- Labels include domains like AI, finance, sports, climate, etc.
Hyperparameters
- Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Evaluation every 200 steps
- Metric: F1 score
Trainer Setup
Used Hugging Face Trainer
API with TrainingArguments
configured for early stopping and best model selection.
Evaluation
Model achieved strong performance across multiple topic categories. Evaluation metrics include:
- Accuracy: ~90.8%
- F1 Score: ~0.91
- Precision: ~0.89
- Recall: ~0.93
Environmental Impact
- Hardware: Google Colab (NVIDIA T4 GPU)
- Training Time: ~2.5 hours
- Carbon Emitted: ~0.3 kg CO₂eq (estimated via ML Impact Calculator)
Citation
@misc{afrologicinsect2025topicmodel,
title = {AfroLogicInsect Topic Classification Model},
author = {Akan Daniel},
year = {2025},
howpublished = {\url{https://huggingface.co/AfroLogicInsect/topic-model-analysis-model}},
}
Contact
- Name: Daniel (@AfroLogicInsect)
- Location: Lagos, Nigeria
- Contact: GitHub / Hugging Face / email (danielamahtoday@gmail.com)
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for AfroLogicInsect/topic-model-analysis-model
Base model
distilbert/distilbert-base-uncased