File size: 13,219 Bytes
f705b02
ee6fd18
f705b02
 
 
 
 
 
 
 
 
 
 
 
 
 
ee6fd18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9dfeb0d
ee6fd18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9dfeb0d
ee6fd18
 
 
 
 
 
9dfeb0d
 
 
ee6fd18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110be9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ee6fd18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c163df6
110be9f
c163df6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
---
license: apache-2.0
datasets:
- FinGPT/fingpt-sentiment-train
language:
- en
metrics:
- accuracy
- f1
- recall
- precision
base_model:
- ProsusAI/finbert
pipeline_tag: text-classification
tags:
- finance
- financial
- news
- sentiment-analysis
- finbert
- transfomer
- text-classification
- financial-news
- financial-news-sentiment
library_name: transformers
---


# 📊 FinBERT Fine-Tuned on Financial News/Texts

A fine-tuned version of [`ProsusAI/finbert`](https://huggingface.co/ProsusAI/finbert) trained for **financial sentiment analysis**  on financial news texts and headlines. 
This fine-tuned model achieves a significant improvement over the original finbert, **outperforming it by over 38% in accuracy** on financial sentiment classification tasks.

---

## 🔧 Model Objective

The goal of this model is to detect **positive**, **neutral**, or **negative sentiment** on financial texts and headlines.

---

## 🗂️ Training Dataset

**Primary Dataset**: [`fingpt-sentiment-train`](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train) (~60,000 examples)

- Labeled financial text samples (positive / neutral / negative)
- Includes earnings statements, market commentary, and financial news headlines
- Only included **neutral**, **positive** and **negative** texts.

---

## 🧪 Benchmark Evaluation

The model was evaluated against **three benchmark datasets**:
- **[Financial PhraseBank (All Agree and All Combined)](https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10)**
- **[FiQA + PhraseBank Kaggle Merge](https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis/data)**
- **[fingpt-sentiment-train (test split)](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train)**

Metrics used:
- **Accuracy**
- **F1 Score**
- **Precision**
- **Recall**


We benchmarked this model against the original [`ProsusAI/finbert`](https://huggingface.co/ProsusAI/finbert) on multiple financial datasets:

| Dataset                            | Samples | Model                   | Accuracy      | F1 (Macro)    | F1 (Weighted) | Precision (Macro) | Precision (Weighted) | Recall (Macro) | Recall (Weighted) |
|------------------------------------|---------|--------------------------|---------------|---------------|----------------|--------------------|------------------------|----------------|--------------------|
| **fingpt-sentiment-train Eval**    | 12511   | FinBERT                  | 0.7131        | 0.70          | 0.71           | 0.71               | 0.72                   | 0.70           | 0.71               |
|                                    |         | **FinBERT-Finetuned (Ours)** | **0.9894 (+38.8%)** | **0.99 (+41.4%)** | **0.99 (+39.4%)** | **0.99 (+39.4%)**      | **0.99 (+37.5%)**          | **0.99 (+41.4%)**  | **0.99 (+39.4%)**      |
| **Financial Phrasebank (Agree)**   | 2264    | FinBERT                  | 0.9717        | 0.96          | 0.97           | 0.95               | 0.97                   | 0.98           | 0.97               |
|                                    |         | **FinBERT-Finetuned (Ours)** | **0.9912 (+2.0%)**  | **0.99 (+3.1%)**  | **0.99 (+2.1%)**  | **0.99 (+4.2%)**       | **0.99 (+2.1%)**           | **0.99 (+1.0%)**   | **0.99 (+2.1%)**       |
| **Financial Phrasebank (Combined)**| 14780   | FinBERT                  | 0.9238        | 0.91          | 0.92           | 0.89               | 0.93                   | 0.94           | 0.92               |
|                                    |         | **FinBERT-Finetuned (Ours)** | **0.9792 (+6.0%)**  | **0.98 (+7.7%)**  | **0.98 (+6.5%)**  | **0.98 (+10.1%)**      | **0.98 (+5.4%)**           | **0.98 (+4.3%)**   | **0.98 (+6.5%)**       |
| **FiQA + PhraseBank (Kaggle)**     | 5842    | FinBERT                  | 0.7581        | 0.74          | 0.77           | 0.73               | 0.79                   | 0.77           | 0.76               |
|                                    |         | **FinBERT-Finetuned (Ours)** | **0.8879 (+17.1%)** | **0.87 (+17.6%)** | **0.89 (+15.6%)** | **0.85 (+16.4%)**      | **0.92 (+16.5%)**          | **0.92 (+19.5%)**  | **0.89 (+17.1%)**      |


> **Note:** All metrics represent classification performance improvements after fine-tuning FinBERT on respective financial sentiment datasets. Metrics in parentheses represent relative improvement over base FinBERT performance.

---
## 🧠 Text-Level Comparison: FinBERT vs FinBERT-Finetuned (Ours)

### 🔴 FinBERT Failed Texts (as per discussed in its [`Paper`](https://arxiv.org/abs/1908.10063)) (Correctly Predicted by Ours)
| Text                                                                                                                        | Expected  | FinBERT                     | Ours                          |
|-----------------------------------------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------|
| Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in the first quarter of 2005.                | Positive  | ❌ Negative (0.7223)         | ✅ Positive (0.9997)           |
| This implementation is very important to the operator, since it is about to launch its Fixed to Mobile convergence service | Neutral   | ❌ Positive (0.7204)         | ✅ Neutral (0.9998)            |
| The situation of coated magazine printing paper will continue to be weak.                                                  | Negative  | ✅ Negative (0.8811)         | ✅ Negative (0.9996)           |

### 🟡 FinBERT Incorrect, Ours Corrected It
| Text                                                                                                           | Expected  | FinBERT                     | Ours                          |
|----------------------------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------|
| The debt-to-equity ratio was 1.15, flat quarter-over-quarter.                                                 | Neutral   | ❌ Negative (0.6239)         | ✅ Neutral (0.9998)            |
| Earnings smashed expectations $AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming!                  | Positive  | ❌ Neutral (0.4237)          | ✅ Positive (0.9998)           |
| $TSLA growth is slowing — but hey, at least Elon tweeted something funny today. #Tesla #markets              | Negative  | ❌ Neutral (0.5884)          | ✅ Negative (0.7084)           |

### ⚪ Out-of-Context Texts (FinBERT Misclassified, Ours Handled Properly)
| Text                                                                                       | Expected  | FinBERT                     | Ours                          |
|--------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------|
| Unexpected Snowstorm Hits Sahara Desert, Blanketing Sand Dunes                            | Neutral   | ❌ Negative (0.8675)         | ✅ Neutral (0.9993)            |
| Virtual Reality Therapy Shows Promise for Treating PTSD                                    | Neutral   | ❌ Positive (0.8522)         | ✅ Neutral (0.9997)            |

> **Note**: These examples demonstrate improvements in real-world understanding, context handling, and sentiment differentiation with our FinBERT-finetuned model.  Values in parentheses (e.g., `0.9485`) indicate the model’s confidence score for its predicted sentiment.

---

## ⚠️ Limitations & Failure Cases

While the model outperformed the base FinBERT across benchmarks, **some failure cases** were observed in statements involving **fine-grained numerical reasoning**, particularly when numerical comparison semantics are complex or subtle.

| Text                                                                                                    | Expected  | FinBERT                     | Ours                          |
|---------------------------------------------------------------------------------------------------------|-----------|------------------------------|-------------------------------|
| Net profit to euro 203 million from euro 172 million in the previous year.                              | Positive  | ✅ Positive (0.9485)         | ✅ Positive (0.9995)           |
| Net profit to euro 103 million from euro 172 million in the previous year.                              | Negative  | ❌ Positive (0.9486)         | ❌ Positive (0.9994)           |
| Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in Q1 2005.               | Positive  | ❌ Negative (0.7223)         | ❌ Positive (0.9997)           |
| Pre-tax loss totaled euro 5.3 million, compared to a loss of euro 2.2 million in Q1 2005.               | Negative  | ✅ Negative (0.7205)         | ❌ Positive (0.9997)           |
| Net profit totaled euro 5.3 million, compared to euro 2.2 million in the previous quarter of 2005.      | Positive  | ❌ Negative (0.6347)         | ✅ Negative (0.9996)           |
| Net profit totaled euro 0.3 million, compared to euro 2.2 million in the previous quarter of 2005.      | Negative  | ✅ Negative (0.6320)         | ✅ Negative (0.9996)           |

> **Note**: Values in parentheses (e.g., `0.9485`) indicate the model’s confidence score for its predicted sentiment.

This suggests that **explicit numerical comparison reasoning** still remains challenging without targeted pretraining or numerical reasoning augmentation.

---

## Hyperparameters

During fine-tuning, the following hyperparameters were used to optimize model performance:

- **Learning Rate:** 2e-5  
- **Batch Size:** 32  
- **Number of Epochs:** 3  
- **Max Sequence Length:** 128 tokens  
- **Optimizer:** AdamW  
- **Weight Decay:** 0.01  
- **Evaluation Strategy:** Evaluation performed after each epoch

> **Note**: These settings were chosen to balance training efficiency and accuracy for financial news sentiment classification.

---

## 💡 Summary**Better generalization** than FinBERT on both benchmark and noisy real-world samples  
✅ **Strong accuracy and F1 scores**  
⚠️ Room to improve on **numerical reasoning comparisons** — potential for integration with numerical-aware transformers or contrastive fine-tuning

---
## Usage

### Pipeline Approach
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Override the config's id2label and label2id
label_map = {0: "neutral", 1: "negative", 2: "positive"}
model.config.id2label = label_map
model.config.label2id = {v: k for k, v in label_map.items()}

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
print(pipe(text)) #Output: [{'label': 'positive', 'score': 0.9997484087944031}]

```

### Simple Approach
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()

label_map = {0: "neutral", 1: "negative", 2: "positive"}
print(f"Text : {text}")
print(f"Sentiment: {label_map[predicted_class]}")

```

---
## Acknowledgements

We gratefully acknowledge the creators and maintainers of the resources used in this project:

- **[ProsusAI/FinBERT](https://huggingface.co/ProsusAI/finbert)** – A pre-trained BERT model specifically designed for financial sentiment analysis, which served as the foundation for our fine-tuning efforts.
  
- **[FinGPT Sentiment Train Dataset](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train)** – The dataset used for fine-tuning, containing a large collection of finance-related news headlines and sentiment annotations.

- **[Financial PhraseBank Dataset](https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10)** – A widely used benchmark dataset for financial sentiment classification, including the *All Agree* and *All Combined* subsets.
  
- **[FiQA + PhraseBank Kaggle Merged Dataset](https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis/data)** – A merged dataset combining FiQA and Financial PhraseBank entries, used for broader benchmarking of sentiment performance.


We thank these contributors for making their models and datasets publicly available, enabling high-quality research and development in financial NLP.


---