--- tags: - text-classification - sentiment-analysis - finance - tinybert datasets: - financial_phrasebank - custom-financial-news metrics: - accuracy - f1 widget: - text: "$AAPL - Apple hits record high after earnings beat" - text: "$TSLA - Tesla misses Q2 delivery estimates" - text: "$MSFT - Microsoft announces new Azure features" --- # TinyBERT Financial News Sentiment Analysis [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-yellow)](https://huggingface.co/your-username/your-model-name) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) A lightweight TinyBERT model fine-tuned for financial news sentiment analysis, achieving 89% accuracy with < 60MB model size and <50ms CPU inference latency. ## Model Details - **Model Type:** Text Classification (Sentiment Analysis) - **Architecture:** TinyBERT (4-layer, 312-hidden) - **Pretrained Base:** `huawei-noah/TinyBERT_General_4L_312D` - **Fine-tuned Dataset:** Financial news headlines with sentiment labels - **Input:** Financial news text (max 128 tokens) - **Output:** Sentiment classification (Negative/Neutral/Positive) ## Performance | Metric | Value | |--------------|--------| | Accuracy | 89.2% | | F1-Score | 0.87 | | Model Size | 54.84MB| | CPU Latency | 28ms | | Quantized Size | 5.3MB | ## Usage ### Direct Inference with Pipeline ```python from transformers import pipeline classifier = pipeline( "text-classification", model="mikeysharma/finance-sentiment-analysis" ) result = classifier("$TSLA - Morgan Stanley upgrades Tesla to Overweight") print(result) ``` ### Using Model & Tokenizer Directly ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis) model = AutoModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis") inputs = tokenizer( "$BYND - JPMorgan cuts Beyond Meat price target", return_tensors="pt", truncation=True, max_length=128 ) with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) print(predictions) ``` ### ONNX Runtime (Optimal for Production) ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mikeysharma/finance-sentiment-analysis") model = ORTModelForSequenceClassification.from_pretrained("mikeysharma/finance-sentiment-analysis") inputs = tokenizer( "Cemex shares fall after Credit Suisse downgrade", return_tensors="pt", truncation=True, max_length=128 ) outputs = model(**inputs) ``` ## Training Data The model was fine-tuned on a dataset of financial news headlines with three sentiment classes: 1. **Negative**: Bearish sentiment, downgrades, losses 2. **Neutral**: Factual reporting, no strong sentiment 3. **Positive**: Bullish sentiment, upgrades, gains Example samples: ``` $AAPL - Apple hits record high after earnings beat (Positive) $TSLA - Tesla misses Q2 delivery estimates (Negative) $MSFT - Microsoft announces new Azure features (Neutral) ``` ## Preprocessing Text is preprocessed with: - Lowercasing - Ticker symbol normalization ($AAPL → AAPL) - URL removal - Special character cleaning - Truncation to 128 tokens ## Deployment For production deployment, we recommend: 1. **ONNX Runtime** for CPU-optimized inference 2. **FastAPI** for REST API serving 3. **Docker** containerization Example Dockerfile: ```dockerfile FROM python:3.8-slim WORKDIR /app COPY . . RUN pip install transformers optimum[onnxruntime] fastapi uvicorn CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"] ``` ## Limitations - Primarily trained on English financial news - Performance may degrade on non-financial text - Short-form text (headlines) works best - May not capture nuanced sarcasm/irony ## Ethical Considerations While useful for market analysis, this model should not be used as sole input for investment decisions. Always combine with human judgment and other data sources. ## Citation If you use this model in your research, please cite: ```bibtex @misc{tinybert-fin-sentiment, author = {Mikey Sharma}, title = {Lightweight Financial News Sentiment Analysis with TinyBERT}, year = {2023}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/mikeysharma/finance-sentiment-analysis}} } ``` --- license: mit ---