AdityaAI9
/

distilbert_finance_sentiment_analysis

Model card Files Files and versions

xet

Community

AdityaAI9 commited on Jun 1, 2025

Commit

054c79a

verified ·

1 Parent(s): c5e6fd0

Update README.md

Browse files

Files changed (1) hide show

README.md +182 -3

README.md CHANGED Viewed

@@ -1,3 +1,182 @@
----
-license: mit
----

+# Financial Sentiment Classifier 📈
+A fine-tuned DistilBERT model for financial text sentiment analysis, capable of classifying financial news and statements into three categories: **positive**, **negative**, and **neutral**.
+## Model Description
+This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) specifically trained on financial text data for sentiment classification. It achieves **97.5% accuracy** on the validation set and is optimized for analyzing financial news, earnings reports, market commentary, and other finance-related text.
+### Key Features
+- **High Performance**: 97.5% accuracy on financial sentiment classification
+- **Fast Inference**: Built on DistilBERT for efficient processing
+- **Domain-Specific**: Trained specifically on financial text data
+- **Balanced Classes**: Handles positive, negative, and neutral sentiments effectively
+## Model Details
+- **Base Model**: distilbert-base-uncased
+- **Task**: Text Classification (Sentiment Analysis)
+- **Language**: English
+- **Domain**: Financial Text
+- **Classes**: 3 (positive, negative, neutral)
+- **Training Data**: ~36K financial text samples (original + synthetic data)
+### Performance Metrics
+| Metric | Score |
+|--------|-------|
+| Accuracy | 97.52% |
+| F1-Score | 97.51% |
+| Precision | 97.52% |
+| Recall | 97.52% |
+## Quick Start
+### Installation
+```bash
+pip install transformers torch
+```
+### Usage
+```python
+from transformers import pipeline
+# Load the classifier
+classifier = pipeline(
+    "text-classification",
+    model="AdityaAI9/distilbert_finance_sentiment_analysis"
+)
+# Analyze financial text
+result = classifier("The company reported strong quarterly earnings with 15% revenue growth.")
+print(result)
+# Output: [{'label': 'positive', 'score': 0.9845}]
+# Multiple examples
+texts = [
+    "Stock prices fell sharply due to disappointing earnings.",
+    "The company maintained steady performance this quarter.",
+    "Revenue exceeded expectations with record-breaking profits."
+]
+results = classifier(texts)
+for text, result in zip(texts, results):
+    print(f"Text: {text}")
+    print(f"Sentiment: {result['label']} (confidence: {result['score']:.3f})")
+    print("-" * 50)
+```
+### Advanced Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("AdityaAI9/distilbert_finance_sentiment_analysis")
+model = AutoModelForSequenceClassification.from_pretrained("AdityaAI9/distilbert_finance_sentiment_analysis")
+# Manual prediction
+text = "The merger is expected to create significant shareholder value."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=-1)
+# Map to labels
+label_mapping = {0: "negative", 1: "neutral", 2: "positive"}
+sentiment = label_mapping[predicted_class.item()]
+confidence = predictions.max().item()
+print(f"Sentiment: {sentiment} (confidence: {confidence:.3f})")
+```
+## Training Data
+The model was trained on a combination of:
+1. **Original Financial News Dataset**: ~5K labeled financial news sentences
+2. **Synthetic Financial Data**: ~31K synthetic financial statements generated using state-of-the-art language models
+The synthetic data generation approach helps address class imbalance and provides diverse financial vocabulary coverage. You can find the synthetic data generation code [here](https://github.com/aditya699/Common-Challenges-in-LLMS-/blob/main/synthetic_data_generator/syndata.py).
+### Data Distribution
+- **Neutral**: 14,638 samples (40.2%)
+- **Negative**: 11,272 samples (31.0%)
+- **Positive**: 10,539 samples (28.8%)
+## Training Details
+### Training Hyperparameters
+- **Epochs**: 5
+- **Batch Size**: 16 (training), 32 (validation)
+- **Learning Rate**: Default AdamW
+- **Max Sequence Length**: 256 tokens
+- **Optimizer**: AdamW
+- **Warmup**: Linear warmup
+### Training Infrastructure
+- **GPU**: CUDA-enabled training
+- **Framework**: Hugging Face Transformers
+- **Evaluation Strategy**: Every 500 steps
+## Use Cases
+This model is particularly useful for:
+- **Financial News Analysis**: Classify sentiment of news articles affecting stock prices
+- **Earnings Report Processing**: Analyze quarterly and annual reports
+- **Market Research**: Sentiment analysis of financial commentary and analyst reports
+- **Trading Signals**: Generate sentiment-based trading indicators
+- **Risk Assessment**: Evaluate sentiment trends for investment decisions
+- **Social Media Monitoring**: Analyze financial discussions on social platforms
+## Limitations and Considerations
+- **Domain Specificity**: Optimized for financial text; may not perform well on general sentiment tasks
+- **Language**: Currently supports English only
+- **Context Window**: Limited to 256 tokens; longer texts will be truncated
+- **Temporal Bias**: Trained on contemporary financial language; may need updates for evolving terminology
+- **Market Context**: Does not consider broader market conditions or temporal context
+## Ethical Considerations
+- This model should not be the sole basis for financial decisions
+- Always combine with fundamental analysis and professional financial advice
+- Be aware of potential biases in training data
+- Consider market volatility and external factors not captured in text
+## Citation
+If you use this model in your research or applications, please cite:
+```bibtex
+@misc{distilbert-finance-sentiment-analysis,
+  title={Financial Sentiment Classifier: A Fine-tuned DistilBERT Model},
+  author={AdityaAI9},
+  year={2024},
+  howpublished={\url{https://huggingface.co/AdityaAI9/distilbert_finance_sentiment_analysis}},
+}
+```
+## License
+This model is released under the MIT License. See LICENSE for more details.
+## Acknowledgments
+- Built using [Hugging Face Transformers](https://huggingface.co/transformers/)
+- Base model: [DistilBERT](https://huggingface.co/distilbert-base-uncased)
+- Synthetic data generation techniques for improved model performance
+## Contact
+For questions, suggestions, or collaboration opportunities, please open an issue in the [GitHub repository](https://github.com/aditya699/Common-Challenges-in-LLMS-) or reach out through Hugging Face.
+---
+**Note**: This model is for research and educational purposes. Always consult with financial professionals before making investment decisions.