--- license: apache-2.0 datasets: - FinGPT/fingpt-sentiment-train language: - en metrics: - accuracy - f1 - recall - precision base_model: - ProsusAI/finbert pipeline_tag: text-classification tags: - finance - financial - news - sentiment-analysis - finbert - transfomer - text-classification - financial-news - financial-news-sentiment library_name: transformers --- # πŸ“Š FinBERT Fine-Tuned on Financial News/Texts A fine-tuned version of [`ProsusAI/finbert`](https://huggingface.co/ProsusAI/finbert) trained for **financial sentiment analysis** on financial news texts and headlines. This fine-tuned model achieves a significant improvement over the original finbert, **outperforming it by over 38% in accuracy** on financial sentiment classification tasks. --- ## πŸ”§ Model Objective The goal of this model is to detect **positive**, **neutral**, or **negative sentiment** on financial texts and headlines. --- ## πŸ—‚οΈ Training Dataset **Primary Dataset**: [`fingpt-sentiment-train`](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train) (~60,000 examples) - Labeled financial text samples (positive / neutral / negative) - Includes earnings statements, market commentary, and financial news headlines - Only included **neutral**, **positive** and **negative** texts. --- ## πŸ§ͺ Benchmark Evaluation The model was evaluated against **three benchmark datasets**: - **[Financial PhraseBank (All Agree and All Combined)](https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10)** - **[FiQA + PhraseBank Kaggle Merge](https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis/data)** - **[fingpt-sentiment-train (test split)](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train)** Metrics used: - **Accuracy** - **F1 Score** - **Precision** - **Recall** We benchmarked this model against the original [`ProsusAI/finbert`](https://huggingface.co/ProsusAI/finbert) on multiple financial datasets: | Dataset | Samples | Model | Accuracy | F1 (Macro) | F1 (Weighted) | Precision (Macro) | Precision (Weighted) | Recall (Macro) | Recall (Weighted) | |------------------------------------|---------|--------------------------|---------------|---------------|----------------|--------------------|------------------------|----------------|--------------------| | **fingpt-sentiment-train Eval** | 12511 | FinBERT | 0.7131 | 0.70 | 0.71 | 0.71 | 0.72 | 0.70 | 0.71 | | | | **FinBERT-Finetuned (Ours)** | **0.9894 (+38.8%)** | **0.99 (+41.4%)** | **0.99 (+39.4%)** | **0.99 (+39.4%)** | **0.99 (+37.5%)** | **0.99 (+41.4%)** | **0.99 (+39.4%)** | | **Financial Phrasebank (Agree)** | 2264 | FinBERT | 0.9717 | 0.96 | 0.97 | 0.95 | 0.97 | 0.98 | 0.97 | | | | **FinBERT-Finetuned (Ours)** | **0.9912 (+2.0%)** | **0.99 (+3.1%)** | **0.99 (+2.1%)** | **0.99 (+4.2%)** | **0.99 (+2.1%)** | **0.99 (+1.0%)** | **0.99 (+2.1%)** | | **Financial Phrasebank (Combined)**| 14780 | FinBERT | 0.9238 | 0.91 | 0.92 | 0.89 | 0.93 | 0.94 | 0.92 | | | | **FinBERT-Finetuned (Ours)** | **0.9792 (+6.0%)** | **0.98 (+7.7%)** | **0.98 (+6.5%)** | **0.98 (+10.1%)** | **0.98 (+5.4%)** | **0.98 (+4.3%)** | **0.98 (+6.5%)** | | **FiQA + PhraseBank (Kaggle)** | 5842 | FinBERT | 0.7581 | 0.74 | 0.77 | 0.73 | 0.79 | 0.77 | 0.76 | | | | **FinBERT-Finetuned (Ours)** | **0.8879 (+17.1%)** | **0.87 (+17.6%)** | **0.89 (+15.6%)** | **0.85 (+16.4%)** | **0.92 (+16.5%)** | **0.92 (+19.5%)** | **0.89 (+17.1%)** | > **Note:** All metrics represent classification performance improvements after fine-tuning FinBERT on respective financial sentiment datasets. Metrics in parentheses represent relative improvement over base FinBERT performance. --- ## 🧠 Text-Level Comparison: FinBERT vs FinBERT-Finetuned (Ours) ### πŸ”΄ FinBERT Failed Texts (as per discussed in its [`Paper`](https://arxiv.org/abs/1908.10063)) (Correctly Predicted by Ours) | Text | Expected | FinBERT | Ours | |-----------------------------------------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------| | Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in the first quarter of 2005. | Positive | ❌ Negative (0.7223) | βœ… Positive (0.9997) | | This implementation is very important to the operator, since it is about to launch its Fixed to Mobile convergence service | Neutral | ❌ Positive (0.7204) | βœ… Neutral (0.9998) | | The situation of coated magazine printing paper will continue to be weak. | Negative | βœ… Negative (0.8811) | βœ… Negative (0.9996) | ### 🟑 FinBERT Incorrect, Ours Corrected It | Text | Expected | FinBERT | Ours | |----------------------------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------| | The debt-to-equity ratio was 1.15, flat quarter-over-quarter. | Neutral | ❌ Negative (0.6239) | βœ… Neutral (0.9998) | | Earnings smashed expectations $AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! | Positive | ❌ Neutral (0.4237) | βœ… Positive (0.9998) | | $TSLA growth is slowing β€” but hey, at least Elon tweeted something funny today. #Tesla #markets | Negative | ❌ Neutral (0.5884) | βœ… Negative (0.7084) | ### βšͺ Out-of-Context Texts (FinBERT Misclassified, Ours Handled Properly) | Text | Expected | FinBERT | Ours | |--------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------| | Unexpected Snowstorm Hits Sahara Desert, Blanketing Sand Dunes | Neutral | ❌ Negative (0.8675) | βœ… Neutral (0.9993) | | Virtual Reality Therapy Shows Promise for Treating PTSD | Neutral | ❌ Positive (0.8522) | βœ… Neutral (0.9997) | > **Note**: These examples demonstrate improvements in real-world understanding, context handling, and sentiment differentiation with our FinBERT-finetuned model. Values in parentheses (e.g., `0.9485`) indicate the model’s confidence score for its predicted sentiment. --- ## ⚠️ Limitations & Failure Cases While the model outperformed the base FinBERT across benchmarks, **some failure cases** were observed in statements involving **fine-grained numerical reasoning**, particularly when numerical comparison semantics are complex or subtle. | Text | Expected | FinBERT | Ours | |---------------------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------| | Net profit to euro 203β€―million from euro 172β€―million in the previous year. | Positive | βœ… Positive (0.9485) | βœ… Positive (0.9995) | | Net profit to euro 103β€―million from euro 172β€―million in the previous year. | Negative | ❌ Positive (0.9486) | ❌ Positive (0.9994) | | Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in Q1 2005. | Positive | ❌ Negative (0.7223) | βœ… Positive (0.9997) | | Pre-tax loss totaled euro 5.3 million, compared to a loss of euro 2.2 million in Q1 2005. | Negative | βœ… Negative (0.7205) | ❌ Positive (0.9997) | | Net profit totaled euro 5.3 million, compared to euro 2.2 million in the previous quarter of 2005. | Positive | ❌ Negative (0.6347) | ❌ Negative (0.9996) | | Net profit totaled euro 0.3 million, compared to euro 2.2 million in the previous quarter of 2005. | Negative | βœ… Negative (0.6320) | βœ… Negative (0.9996) | > **Note**: Values in parentheses (e.g., `0.9485`) indicate the model’s confidence score for its predicted sentiment. This suggests that **explicit numerical comparison reasoning** still remains challenging without targeted pretraining or numerical reasoning augmentation. --- ## Hyperparameters During fine-tuning, the following hyperparameters were used to optimize model performance: - **Learning Rate:** 2e-5 - **Batch Size:** 32 - **Number of Epochs:** 3 - **Max Sequence Length:** 128 tokens - **Optimizer:** AdamW - **Weight Decay:** 0.01 - **Evaluation Strategy:** Evaluation performed after each epoch > **Note**: These settings were chosen to balance training efficiency and accuracy for financial news sentiment classification. --- ## πŸ’‘ Summary βœ… **Better generalization** than FinBERT on both benchmark and noisy real-world samples βœ… **Strong accuracy and F1 scores** ⚠️ Room to improve on **numerical reasoning comparisons** β€” potential for integration with numerical-aware transformers or contrastive fine-tuning --- ## Usage ### Pipeline Approach ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline import torch model_name = "project-aps/finbert-finetune" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Override the config's id2label and label2id label_map = {0: "neutral", 1: "negative", 2: "positive"} model.config.id2label = label_map model.config.label2id = {v: k for k, v in label_map.items()} pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason" print(pipe(text)) #Output: [{'label': 'positive', 'score': 0.9997484087944031}] ``` ### Simple Approach ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "project-aps/finbert-finetune" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason" inputs = tokenizer(text, return_tensors="pt", truncation=True) outputs = model(**inputs) predicted_class = torch.argmax(outputs.logits, dim=1).item() label_map = {0: "neutral", 1: "negative", 2: "positive"} print(f"Text : {text}") print(f"Sentiment: {label_map[predicted_class]}") ``` --- ## Acknowledgements We gratefully acknowledge the creators and maintainers of the resources used in this project: - **[ProsusAI/FinBERT](https://huggingface.co/ProsusAI/finbert)** – A pre-trained BERT model specifically designed for financial sentiment analysis, which served as the foundation for our fine-tuning efforts. - **[FinGPT Sentiment Train Dataset](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train)** – The dataset used for fine-tuning, containing a large collection of finance-related news headlines and sentiment annotations. - **[Financial PhraseBank Dataset](https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10)** – A widely used benchmark dataset for financial sentiment classification, including the *All Agree* and *All Combined* subsets. - **[FiQA + PhraseBank Kaggle Merged Dataset](https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis/data)** – A merged dataset combining FiQA and Financial PhraseBank entries, used for broader benchmarking of sentiment performance. We thank these contributors for making their models and datasets publicly available, enabling high-quality research and development in financial NLP. ---