mikeysharma's picture
Update README.md
83e37af verified
---
tags:
- text-classification
- sentiment-analysis
- finance
- tinybert
datasets:
- financial_phrasebank
- custom-financial-news
metrics:
- accuracy
- f1
widget:
- text: "$AAPL - Apple hits record high after earnings beat"
- text: "$TSLA - Tesla misses Q2 delivery estimates"
- text: "$MSFT - Microsoft announces new Azure features"
---
# TinyBERT Financial News Sentiment Analysis
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-yellow)](https://huggingface.co/your-username/your-model-name)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
A lightweight TinyBERT model fine-tuned for financial news sentiment analysis, achieving 89% accuracy with < 60MB model size and <50ms CPU inference latency.
## Model Details
- **Model Type:** Text Classification (Sentiment Analysis)
- **Architecture:** TinyBERT (4-layer, 312-hidden)
- **Pretrained Base:** `huawei-noah/TinyBERT_General_4L_312D`
- **Fine-tuned Dataset:** Financial news headlines with sentiment labels
- **Input:** Financial news text (max 128 tokens)
- **Output:** Sentiment classification (Negative/Neutral/Positive)
## Performance
| Metric | Value |
|--------------|--------|
| Accuracy | 89.2% |
| F1-Score | 0.87 |
| Model Size | 54.84MB|
| CPU Latency | 28ms |
| Quantized Size | 5.3MB |
## Usage
### Direct Inference with Pipeline
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="mikeysharma/finance-sentiment-analysis"
)
result = classifier("$TSLA - Morgan Stanley upgrades Tesla to Overweight")
print(result)
```
### Using Model & Tokenizer Directly
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis)
model = AutoModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis")
inputs = tokenizer(
"$BYND - JPMorgan cuts Beyond Meat price target",
return_tensors="pt",
truncation=True,
max_length=128
)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)
```
### ONNX Runtime (Optimal for Production)
```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis")
model = ORTModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis")
inputs = tokenizer(
"Cemex shares fall after Credit Suisse downgrade",
return_tensors="pt",
truncation=True,
max_length=128
)
outputs = model(**inputs)
```
## Training Data
The model was fine-tuned on a dataset of financial news headlines with three sentiment classes:
1. **Negative**: Bearish sentiment, downgrades, losses
2. **Neutral**: Factual reporting, no strong sentiment
3. **Positive**: Bullish sentiment, upgrades, gains
Example samples:
```
$AAPL - Apple hits record high after earnings beat (Positive)
$TSLA - Tesla misses Q2 delivery estimates (Negative)
$MSFT - Microsoft announces new Azure features (Neutral)
```
## Preprocessing
Text is preprocessed with:
- Lowercasing
- Ticker symbol normalization ($AAPL → AAPL)
- URL removal
- Special character cleaning
- Truncation to 128 tokens
## Deployment
For production deployment, we recommend:
1. **ONNX Runtime** for CPU-optimized inference
2. **FastAPI** for REST API serving
3. **Docker** containerization
Example Dockerfile:
```dockerfile
FROM python:3.8-slim
WORKDIR /app
COPY . .
RUN pip install transformers optimum[onnxruntime] fastapi uvicorn
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
```
## Limitations
- Primarily trained on English financial news
- Performance may degrade on non-financial text
- Short-form text (headlines) works best
- May not capture nuanced sarcasm/irony
## Ethical Considerations
While useful for market analysis, this model should not be used as sole input for investment decisions. Always combine with human judgment and other data sources.
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{tinybert-fin-sentiment,
author = {Mikey Sharma},
title = {Lightweight Financial News Sentiment Analysis with TinyBERT},
year = {2023},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/mikeysharma/finance-sentiment-analysis}}
}
```
---
license: mit
---