FinBERT Sentiment Regression

A fine-tuned ProsusAI/finbert (109M params) for continuous sentiment scoring — outputs a single float in [-1, +1] instead of a discrete class label. Trained on the FiQA 2018 financial opinion mining dataset with expert-annotated continuous sentiment scores.

Part of the macro-sentiment-finbert ecosystem. While the sibling 3-class models classify text into positive/negative/neutral buckets, this model captures the intensity and gradation of sentiment — distinguishing between mildly positive (+0.2) and strongly positive (+0.8), or between mild concern (-0.3) and acute crisis (-0.9).

Why Continuous Sentiment?

Discrete 3-class sentiment (positive/neutral/negative) loses information. Consider these headlines:

Text	3-class label	Continuous score
"Revenue slightly exceeded expectations"	positive	+0.21
"Revenue smashed all records by 40%"	positive	+0.82
"Minor supply chain delays reported"	negative	-0.18
"Company declares bankruptcy amid fraud"	negative	-0.95

A 3-class model assigns the same label to both rows in each pair. A regression model captures the magnitude — enabling time-series tracking, ranking, index construction, and threshold-based filtering at any arbitrary cutoff.

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("peyterho/finbert-sentiment-regression")
model = AutoModelForSequenceClassification.from_pretrained("peyterho/finbert-sentiment-regression")
model.eval()

def score(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        return model(**inputs).logits.item()

# Strong positive
score("Tesla shares surged 15% after crushing earnings expectations")
# → +0.65

# Mild positive
score("FDA Approves generic version of AstraZeneca heartburn drug")
# → +0.22

# Near-neutral
score("The committee will meet next Tuesday to discuss the quarterly report")
# → +0.03

# Mild negative
score("Tesla recalls 2,700 Model X SUVs")
# → -0.31

# Strong negative
score("Markets crashed amid recession fears and massive layoffs")
# → -0.78

Batch Scoring

import torch

texts = [
    "Q3 revenue surged 24% year-over-year",
    "The board appointed a new interim CEO",
    "Credit markets froze as contagion fears spread",
]

inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=128)
with torch.no_grad():
    scores = model(**inputs).logits.squeeze(-1).tolist()

for text, s in zip(texts, scores):
    print(f"{s:+.3f}  {text}")
# +0.58  Q3 revenue surged 24% year-over-year
# +0.05  The board appointed a new interim CEO
# -0.72  Credit markets froze as contagion fears spread

Building a Sentiment Index

import pandas as pd

# Score a time-series of headlines
headlines = pd.DataFrame({
    "date": ["2024-01-15", "2024-01-16", "2024-01-17", "2024-01-18"],
    "text": [
        "Strong jobs report pushes markets to record highs",
        "Tech earnings mixed as AI spending soars",
        "Fed signals patience on rate cuts, markets dip",
        "Retail sales disappoint, recession fears resurface",
    ]
})

headlines["sentiment"] = headlines["text"].apply(score)
headlines["rolling_avg"] = headlines["sentiment"].rolling(3, min_periods=1).mean()
print(headlines[["date", "sentiment", "rolling_avg"]])

Evaluation Results

Evaluated on the FiQA 2018 test split (150 samples):

Metric	Value
MSE	0.069
RMSE	0.263
Pearson r	0.70

A Pearson correlation of 0.70 indicates strong linear agreement with expert annotations. The RMSE of 0.263 on a [-1, +1] scale means predictions are typically within ~0.26 of the ground-truth score.

Interpreting the Output Scale

Score Range	Interpretation	Example
+0.6 to +1.0	Strongly positive	Earnings beat, major contract win, strong growth
+0.2 to +0.6	Moderately positive	Modest revenue growth, favorable regulatory decision
-0.2 to +0.2	Neutral / mixed	Factual reporting, mixed signals, routine announcements
-0.6 to -0.2	Moderately negative	Missed estimates, minor product recall, cautious guidance
-1.0 to -0.6	Strongly negative	Market crash, bankruptcy, fraud, major crisis

Note: These thresholds are approximate. The model's output is unbounded (it can theoretically exceed ±1.0 on extreme inputs). Clip to [-1, +1] if strict bounds are required.

Training Data

Trained on the FiQA 2018 financial opinion mining dataset — expert-annotated continuous sentiment scores for financial microblogs and news headlines.

Split	Samples	Source
Train	938 (train + validation merged)	Financial microblogs + news headlines
Test	150	Held-out evaluation

Dataset Characteristics

The FiQA dataset contains two text formats:

Microblogs/posts — short, informal financial social media posts with $cashtags, slang, and abbreviations (e.g., "Still short $LNG from $11.70 area...next stop could be down through $9.00")
Headlines — concise news headlines in formal register (e.g., "How Kraft-Heinz Merger Came Together in Speedy 10 Weeks")

Each sample includes:

sentence — the full text
sentiment_score — expert-annotated continuous score (approximately [-1, +1])
target — the entity being discussed (stock ticker or company name)
aspects — fine-grained aspect categories (e.g., Stock/Price Action/Bearish, Corporate/M&A)

The sentiment scores are entity-relative — the same event can have different sentiment for different companies (e.g., an acquisition might be positive for the acquirer but negative for a competitor).

Score Distribution

The training data spans the full [-1, +1] range with moderate concentration near zero:

~35% of samples in [-0.15, +0.15] (near-neutral)
~40% of samples in [+0.15, +1.0] (positive)
~25% of samples in [-1.0, -0.15] (negative)

Training Details

Hyperparameter	Value
Base model	ProsusAI/finbert (109M, BERT-base, 12 layers, 768 hidden, 12 heads)
Task	Regression (single output neuron, `problem_type="regression"`)
Loss function	MSE
Learning rate	2e-5
Batch size	64 effective
Epochs	6 (best checkpoint at epoch 3)
Scheduler	Linear decay with warmup
Max length	128 tokens
Precision	FP16
Seed	42
Best model selection	Lowest validation MSE

Architecture

The model uses BertForSequenceClassification with num_labels=1 and problem_type="regression". The classification head is a single linear layer mapping the 768-dim [CLS] representation to a scalar output. No activation function is applied — the output is unbounded, but in practice stays near [-1, +1] for financial text.

When to Use This vs. the 3-Class Models

Use Case	Recommended Model
Categorical labelling — "is this positive, negative, or neutral?"	finbert-macro-sentiment (3-class)
Sentiment intensity — "how positive/negative is this?"	This model (regression)
Time-series / sentiment indices — track sentiment over time	This model (regression)
Ranking — sort texts by sentiment strength	This model (regression)
Threshold-based filtering — custom cutoffs (e.g., flag text with score < -0.5)	This model (regression)
Multi-signal analysis — sentiment + policy stance + crisis	Full pipeline
Climate/ESG text	climatebert-macro-sentiment (3-class)
Highest classification accuracy	financial-roberta-large-macro-sentiment (355M, 3-class)

Deriving Classes from Continuous Scores

You can always convert this model's continuous output to discrete labels at any threshold:

def to_label(score, threshold=0.15):
    if score > threshold:
        return "positive"
    elif score < -threshold:
        return "negative"
    else:
        return "neutral"

sentiment = score("Tesla shares surged after earnings beat")
label = to_label(sentiment)  # "positive"

This gives you tunable sensitivity — a lower threshold catches weaker signals, a higher threshold only flags strong sentiment.

Related Models

This model is part of the macro-sentiment-finbert ecosystem:

Model	Type	Params	Task
peyterho/macro-sentiment-finbert	Ensemble pipeline	—	Multi-signal macro sentiment (full system)
peyterho/finbert-macro-sentiment	3-class	109M	Financial news sentiment (default head)
peyterho/financial-roberta-large-macro-sentiment	3-class	355M	Policy/macro sentiment (best accuracy)
peyterho/climatebert-macro-sentiment	3-class	82M	Climate/ESG sentiment
peyterho/finbert-sentiment-regression ★	Regression	109M	Continuous sentiment [-1, +1]

Limitations

Small training set — only 938 training samples from FiQA 2018. The model benefits from FinBERT's financial pre-training, but may not generalize well to domains far from financial microblogs and news headlines (e.g., lengthy 10-K filings, academic papers).
Entity-relative annotations — FiQA scores are annotated relative to a target entity. The model doesn't receive explicit entity information at inference time, so it infers the "main subject" from context. For texts discussing multiple entities with opposing sentiment, the output is an aggregate.
128-token max length — truncates longer inputs. Chunk longer documents at sentence or paragraph level.
Unbounded output — the model can theoretically output scores outside [-1, +1] on extreme or out-of-distribution inputs. Apply torch.clamp(score, -1.0, 1.0) if strict bounds are needed.
No aspect-level scoring — FiQA includes aspect annotations (e.g., Corporate/Dividend Policy, Stock/Technical Analysis), but this model only predicts overall sentiment, not per-aspect scores.
English only — pre-trained and fine-tuned exclusively on English text.
Pearson r = 0.70 — while reasonable for a small dataset, ~30% of variance in human annotations is unexplained. Use averaged scores over multiple texts for more reliable aggregate signals.

Citation

@article{araci2019finbert,
    title={FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models},
    author={Araci, Dogu},
    journal={arXiv preprint arXiv:1908.10063},
    year={2019}
}

@inproceedings{maia2018fiqa,
    title={WWW'18 Open Challenge: Financial Opinion Mining and Question Answering},
    author={Maia, Macedo and Handschuh, Siegfried and Freitas, André and Davis, Brian and McDermott, Ross and Zarrouk, Manel and Balahur, Alexandra},
    booktitle={Companion Proceedings of the The Web Conference 2018},
    pages={1941--1942},
    year={2018}
}

Framework Versions

Transformers 5.6.2
PyTorch 2.11.0+cu130
Datasets 4.8.4
Tokenizers 0.22.2

License

Apache 2.0

Downloads last month: 78

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for peyterho/finbert-sentiment-regression

Base model

ProsusAI/finbert

Finetuned

(95)

this model

Dataset used to train peyterho/finbert-sentiment-regression

Paper for peyterho/finbert-sentiment-regression

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

Paper • 1908.10063 • Published Aug 27, 2019 • 3

Evaluation results

MSE on FiQA 2018
self-reported

0.069
Pearson r on FiQA 2018
self-reported

0.700