Update README.md

83e37af verified 3 months ago

4.6 kB

	---
	tags:
	- text-classification
	- sentiment-analysis
	- finance
	- tinybert
	datasets:
	- financial_phrasebank
	- custom-financial-news
	metrics:
	- accuracy
	- f1
	widget:
	- text: "$AAPL - Apple hits record high after earnings beat"
	- text: "$TSLA - Tesla misses Q2 delivery estimates"
	- text: "$MSFT - Microsoft announces new Azure features"
	---

	# TinyBERT Financial News Sentiment Analysis

	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-yellow)](https://huggingface.co/your-username/your-model-name)
	[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

	A lightweight TinyBERT model fine-tuned for financial news sentiment analysis, achieving 89% accuracy with < 60MB model size and <50ms CPU inference latency.

	## Model Details

	- Model Type: Text Classification (Sentiment Analysis)
	- Architecture: TinyBERT (4-layer, 312-hidden)
	- Pretrained Base: `huawei-noah/TinyBERT_General_4L_312D`
	- Fine-tuned Dataset: Financial news headlines with sentiment labels
	- Input: Financial news text (max 128 tokens)
	- Output: Sentiment classification (Negative/Neutral/Positive)

	## Performance

	\| Metric \| Value \|
	\|--------------\|--------\|
	\| Accuracy \| 89.2% \|
	\| F1-Score \| 0.87 \|
	\| Model Size \| 54.84MB\|
	\| CPU Latency \| 28ms \|
	\| Quantized Size \| 5.3MB \|

	## Usage

	### Direct Inference with Pipeline

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="mikeysharma/finance-sentiment-analysis"
	)

	result = classifier("$TSLA - Morgan Stanley upgrades Tesla to Overweight")
	print(result)
	```

	### Using Model & Tokenizer Directly

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis)
	model = AutoModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis")

	inputs = tokenizer(
	"$BYND - JPMorgan cuts Beyond Meat price target",
	return_tensors="pt",
	truncation=True,
	max_length=128
	)

	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	print(predictions)
	```

	### ONNX Runtime (Optimal for Production)

	```python
	from optimum.onnxruntime import ORTModelForSequenceClassification
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis")
	model = ORTModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis")

	inputs = tokenizer(
	"Cemex shares fall after Credit Suisse downgrade",
	return_tensors="pt",
	truncation=True,
	max_length=128
	)

	outputs = model(**inputs)
	```

	## Training Data

	The model was fine-tuned on a dataset of financial news headlines with three sentiment classes:

	1. Negative: Bearish sentiment, downgrades, losses
	2. Neutral: Factual reporting, no strong sentiment
	3. Positive: Bullish sentiment, upgrades, gains

	Example samples:
	```
	$AAPL - Apple hits record high after earnings beat (Positive)
	$TSLA - Tesla misses Q2 delivery estimates (Negative)
	$MSFT - Microsoft announces new Azure features (Neutral)
	```

	## Preprocessing

	Text is preprocessed with:
	- Lowercasing
	- Ticker symbol normalization ($AAPL → AAPL)
	- URL removal
	- Special character cleaning
	- Truncation to 128 tokens

	## Deployment

	For production deployment, we recommend:

	1. ONNX Runtime for CPU-optimized inference
	2. FastAPI for REST API serving
	3. Docker containerization

	Example Dockerfile:
	```dockerfile
	FROM python:3.8-slim

	WORKDIR /app
	COPY . .

	RUN pip install transformers optimum[onnxruntime] fastapi uvicorn

	CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
	```

	## Limitations

	- Primarily trained on English financial news
	- Performance may degrade on non-financial text
	- Short-form text (headlines) works best
	- May not capture nuanced sarcasm/irony

	## Ethical Considerations

	While useful for market analysis, this model should not be used as sole input for investment decisions. Always combine with human judgment and other data sources.

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{tinybert-fin-sentiment,
	author = {Mikey Sharma},
	title = {Lightweight Financial News Sentiment Analysis with TinyBERT},
	year = {2023},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/mikeysharma/finance-sentiment-analysis}}
	}
	```

	---
	license: mit
	---