File size: 4,603 Bytes
df9578d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c6c07dc
248e35f
 
 
 
 
83e37af
248e35f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83e37af
248e35f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c6c07dc
 
248e35f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c6c07dc
 
248e35f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
tags:
- text-classification
- sentiment-analysis
- finance
- tinybert
datasets:
- financial_phrasebank
- custom-financial-news
metrics:
- accuracy
- f1
widget:
- text: "$AAPL - Apple hits record high after earnings beat"
- text: "$TSLA - Tesla misses Q2 delivery estimates"
- text: "$MSFT - Microsoft announces new Azure features"
---

# TinyBERT Financial News Sentiment Analysis

[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-yellow)](https://huggingface.co/your-username/your-model-name)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

A lightweight TinyBERT model fine-tuned for financial news sentiment analysis, achieving 89% accuracy with < 60MB model size and <50ms CPU inference latency.

## Model Details

- **Model Type:** Text Classification (Sentiment Analysis)
- **Architecture:** TinyBERT (4-layer, 312-hidden)
- **Pretrained Base:** `huawei-noah/TinyBERT_General_4L_312D`
- **Fine-tuned Dataset:** Financial news headlines with sentiment labels
- **Input:** Financial news text (max 128 tokens)
- **Output:** Sentiment classification (Negative/Neutral/Positive)

## Performance

| Metric       | Value  |
|--------------|--------|
| Accuracy     | 89.2%  |
| F1-Score     | 0.87   |
| Model Size   | 54.84MB|
| CPU Latency  | 28ms   |
| Quantized Size | 5.3MB |

## Usage

### Direct Inference with Pipeline

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification", 
    model="mikeysharma/finance-sentiment-analysis"
)

result = classifier("$TSLA - Morgan Stanley upgrades Tesla to Overweight")
print(result)
```

### Using Model & Tokenizer Directly

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis)
model = AutoModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis")

inputs = tokenizer(
    "$BYND - JPMorgan cuts Beyond Meat price target",
    return_tensors="pt",
    truncation=True,
    max_length=128
)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    print(predictions)
```

### ONNX Runtime (Optimal for Production)

```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis")
model = ORTModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis")

inputs = tokenizer(
    "Cemex shares fall after Credit Suisse downgrade",
    return_tensors="pt",
    truncation=True,
    max_length=128
)

outputs = model(**inputs)
```

## Training Data

The model was fine-tuned on a dataset of financial news headlines with three sentiment classes:

1. **Negative**: Bearish sentiment, downgrades, losses
2. **Neutral**: Factual reporting, no strong sentiment
3. **Positive**: Bullish sentiment, upgrades, gains

Example samples:
```
$AAPL - Apple hits record high after earnings beat (Positive)
$TSLA - Tesla misses Q2 delivery estimates (Negative)
$MSFT - Microsoft announces new Azure features (Neutral)
```

## Preprocessing

Text is preprocessed with:
- Lowercasing
- Ticker symbol normalization ($AAPL → AAPL)
- URL removal
- Special character cleaning
- Truncation to 128 tokens

## Deployment

For production deployment, we recommend:

1. **ONNX Runtime** for CPU-optimized inference
2. **FastAPI** for REST API serving
3. **Docker** containerization

Example Dockerfile:
```dockerfile
FROM python:3.8-slim

WORKDIR /app
COPY . .

RUN pip install transformers optimum[onnxruntime] fastapi uvicorn

CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
```

## Limitations

- Primarily trained on English financial news
- Performance may degrade on non-financial text
- Short-form text (headlines) works best
- May not capture nuanced sarcasm/irony

## Ethical Considerations

While useful for market analysis, this model should not be used as sole input for investment decisions. Always combine with human judgment and other data sources.

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{tinybert-fin-sentiment,
  author = {Mikey Sharma},
  title = {Lightweight Financial News Sentiment Analysis with TinyBERT},
  year = {2023},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/mikeysharma/finance-sentiment-analysis}}
}
```

---
license: mit
---