|
--- |
|
tags: |
|
- text-classification |
|
- sentiment-analysis |
|
- finance |
|
- tinybert |
|
datasets: |
|
- financial_phrasebank |
|
- custom-financial-news |
|
metrics: |
|
- accuracy |
|
- f1 |
|
widget: |
|
- text: "$AAPL - Apple hits record high after earnings beat" |
|
- text: "$TSLA - Tesla misses Q2 delivery estimates" |
|
- text: "$MSFT - Microsoft announces new Azure features" |
|
--- |
|
|
|
# TinyBERT Financial News Sentiment Analysis |
|
|
|
[](https://huggingface.co/your-username/your-model-name) |
|
[](https://opensource.org/licenses/MIT) |
|
|
|
A lightweight TinyBERT model fine-tuned for financial news sentiment analysis, achieving 89% accuracy with < 60MB model size and <50ms CPU inference latency. |
|
|
|
## Model Details |
|
|
|
- **Model Type:** Text Classification (Sentiment Analysis) |
|
- **Architecture:** TinyBERT (4-layer, 312-hidden) |
|
- **Pretrained Base:** `huawei-noah/TinyBERT_General_4L_312D` |
|
- **Fine-tuned Dataset:** Financial news headlines with sentiment labels |
|
- **Input:** Financial news text (max 128 tokens) |
|
- **Output:** Sentiment classification (Negative/Neutral/Positive) |
|
|
|
## Performance |
|
|
|
| Metric | Value | |
|
|--------------|--------| |
|
| Accuracy | 89.2% | |
|
| F1-Score | 0.87 | |
|
| Model Size | 54.84MB| |
|
| CPU Latency | 28ms | |
|
| Quantized Size | 5.3MB | |
|
|
|
## Usage |
|
|
|
### Direct Inference with Pipeline |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
classifier = pipeline( |
|
"text-classification", |
|
model="mikeysharma/finance-sentiment-analysis" |
|
) |
|
|
|
result = classifier("$TSLA - Morgan Stanley upgrades Tesla to Overweight") |
|
print(result) |
|
``` |
|
|
|
### Using Model & Tokenizer Directly |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis) |
|
model = AutoModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis") |
|
|
|
inputs = tokenizer( |
|
"$BYND - JPMorgan cuts Beyond Meat price target", |
|
return_tensors="pt", |
|
truncation=True, |
|
max_length=128 |
|
) |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
print(predictions) |
|
``` |
|
|
|
### ONNX Runtime (Optimal for Production) |
|
|
|
```python |
|
from optimum.onnxruntime import ORTModelForSequenceClassification |
|
from transformers import AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis") |
|
model = ORTModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis") |
|
|
|
inputs = tokenizer( |
|
"Cemex shares fall after Credit Suisse downgrade", |
|
return_tensors="pt", |
|
truncation=True, |
|
max_length=128 |
|
) |
|
|
|
outputs = model(**inputs) |
|
``` |
|
|
|
## Training Data |
|
|
|
The model was fine-tuned on a dataset of financial news headlines with three sentiment classes: |
|
|
|
1. **Negative**: Bearish sentiment, downgrades, losses |
|
2. **Neutral**: Factual reporting, no strong sentiment |
|
3. **Positive**: Bullish sentiment, upgrades, gains |
|
|
|
Example samples: |
|
``` |
|
$AAPL - Apple hits record high after earnings beat (Positive) |
|
$TSLA - Tesla misses Q2 delivery estimates (Negative) |
|
$MSFT - Microsoft announces new Azure features (Neutral) |
|
``` |
|
|
|
## Preprocessing |
|
|
|
Text is preprocessed with: |
|
- Lowercasing |
|
- Ticker symbol normalization ($AAPL → AAPL) |
|
- URL removal |
|
- Special character cleaning |
|
- Truncation to 128 tokens |
|
|
|
## Deployment |
|
|
|
For production deployment, we recommend: |
|
|
|
1. **ONNX Runtime** for CPU-optimized inference |
|
2. **FastAPI** for REST API serving |
|
3. **Docker** containerization |
|
|
|
Example Dockerfile: |
|
```dockerfile |
|
FROM python:3.8-slim |
|
|
|
WORKDIR /app |
|
COPY . . |
|
|
|
RUN pip install transformers optimum[onnxruntime] fastapi uvicorn |
|
|
|
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"] |
|
``` |
|
|
|
## Limitations |
|
|
|
- Primarily trained on English financial news |
|
- Performance may degrade on non-financial text |
|
- Short-form text (headlines) works best |
|
- May not capture nuanced sarcasm/irony |
|
|
|
## Ethical Considerations |
|
|
|
While useful for market analysis, this model should not be used as sole input for investment decisions. Always combine with human judgment and other data sources. |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@misc{tinybert-fin-sentiment, |
|
author = {Mikey Sharma}, |
|
title = {Lightweight Financial News Sentiment Analysis with TinyBERT}, |
|
year = {2023}, |
|
publisher = {Hugging Face}, |
|
howpublished = {\url{https://huggingface.co/mikeysharma/finance-sentiment-analysis}} |
|
} |
|
``` |
|
|
|
--- |
|
license: mit |
|
--- |
|
|