Text Classification
Safetensors
English
roberta

Financial Sentiment Analysis RoBERTa

A fine-tuned RoBERTa model for financial sentiment analysis that classifies text into negative, neutral, or positive sentiment.

Model Description

This model is a fine-tuned version of roberta-base specifically trained for financial text sentiment classification. It achieves 98.72% accuracy and 98.44% F1 (macro) on the Financial PhraseBank (100% agreement) benchmark.

Labels

Label ID Description
negative 0 Negative financial sentiment
neutral 1 Neutral financial sentiment
positive 2 Positive financial sentiment

Usage

from transformers import RobertaForSequenceClassification, RobertaTokenizer
import torch

# Load model and tokenizer
model_name = "alasteirho/alasteirho/FIN-RoBERTa-Custom" 
tokenizer = RobertaTokenizer.from_pretrained(model_name)
model = RobertaForSequenceClassification.from_pretrained(model_name)
model.eval()

# Predict sentiment
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)
        pred = torch.argmax(probs, dim=1).item()
    
    labels = ["negative", "neutral", "positive"]
    return labels[pred], probs[0].tolist()

# Example
text = "The company reported strong quarterly earnings, beating analyst expectations."
sentiment, confidence = predict_sentiment(text)
print(f"Sentiment: {sentiment}")
print(f"Confidence: {confidence}")

Using Pipeline (Simple)

from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="alasteirho/FIN-RoBERTa-Custom")
result = classifier("Revenue declined 15% year-over-year due to weak demand.")
print(result)

Training Data

The model was trained on a combined dataset from multiple financial sentiment sources:

Dataset Samples Description
Financial PhraseBank (50% agree) ~4,840 Financial news sentences
Twitter Financial News ~9,500 Financial tweets
FiQA 2018 ~1,100 Financial QA sentiment
SemEval 2017 Task 5 ~1,100 Financial news headlines

Total training samples: ~14,000 (after deduplication)

Evaluation Results

Financial PhraseBank (100% Agreement) - Held-out Test Set

Metric Score
Accuracy 0.9872
F1 (Macro) 0.9844
F1 (Weighted) 0.9872

Per-Class Performance

Class Precision Recall F1-Score Support
Negative 0.97 0.99 0.98 303
Neutral 0.99 0.99 0.99 1,391
Positive 0.98 0.98 0.98 570

Confusion Matrix

              Predicted
           neg  neu  pos
Actual neg  300    1    2
       neu    5 1376   10
       pos    3    8  559

Training Procedure

Hyperparameters

Parameter Value
Base Model roberta-base
Epochs 2 (early stopping)
Batch Size 16
Gradient Accumulation 2
Effective Batch Size 32
Learning Rate 2e-5
Weight Decay 0.01
Warmup Ratio 0.1
LR Scheduler Linear
Max Sequence Length 128
FP16 True

Training Metrics (Best Checkpoint - Epoch 2)

  • Validation Loss: 0.3658
  • Validation Accuracy: 85.93%
  • Validation F1 (Macro): 84.55%

Intended Use

This model is designed for:

  • Financial news sentiment analysis
  • Stock/market sentiment classification
  • Financial social media sentiment analysis
  • Earnings call/report sentiment extraction

Limitations

  • Trained primarily on English financial text
  • May not generalise well to non-financial domains
  • Performance may vary on very long documents (max 128 tokens)
  • Cryptocurrency and emerging market terminology may be underrepresented

License

MIT License

Downloads last month
81
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alasteirho/FIN-RoBERTa-Custom

Finetuned
(2085)
this model

Datasets used to train alasteirho/FIN-RoBERTa-Custom