|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- finance |
|
|
- sentiment-analysis |
|
|
- roberta |
|
|
- classification |
|
|
model-index: |
|
|
- name: RoBERTa-Large for Financial Sentiment Analysis |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Sentiment Analysis |
|
|
dataset: |
|
|
name: Finance News Sentiments (Kaggle) |
|
|
type: text |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.7627 |
|
|
- type: multiclass_roc_auc |
|
|
value: 0.9124 |
|
|
base_model: |
|
|
- FacebookAI/roberta-large |
|
|
--- |
|
|
|
|
|
# RoBERTa-Large Fine-Tuned for Financial Sentiment Analysis |
|
|
|
|
|
This repository contains a RoBERTa-based model for financial sentiment classification. The model predicts whether a financial news headline or sentence is **positive**, **neutral**, or **negative**. |
|
|
|
|
|
## Model Overview |
|
|
- **Base model:** RoBERTa-Large |
|
|
- **Task:** Financial sentiment classification (3 classes) |
|
|
- **Training data:** Financial news headlines and sentences |
|
|
- **Dataset source:** [Kaggle - Finance News Sentiments](https://www.kaggle.com/datasets/antobenedetti/finance-news-sentiments/data?select=dataset.csv) |
|
|
- **Output labels:** |
|
|
- 0: Negative |
|
|
- 1: Neutral |
|
|
- 2: Positive |
|
|
|
|
|
## Evaluation Results |
|
|
- **Test Accuracy:** 0.7627 |
|
|
- **Multiclass ROC AUC (macro-average):** 0.9124 |
|
|
|
|
|
## Model Folder Structure |
|
|
``` |
|
|
roberta_finance_sentiment/ |
|
|
config.json |
|
|
merges.txt |
|
|
model.safetensors |
|
|
special_tokens_map.json |
|
|
tokenizer_config.json |
|
|
tokenizer.json |
|
|
vocab.json |
|
|
``` |
|
|
**Note:** Only the model files are stored in `roberta_finance_sentiment/`. Scripts and datasets are kept separate and are not included in this folder or in the model upload. |
|
|
|
|
|
## How to Use the Fine-Tuned Model |
|
|
|
|
|
### 1. Load and Use the Model in Python |
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
# Directory of the model folder |
|
|
model_dir = "roberta_finance_sentiment" |
|
|
# read the model |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_dir) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_dir) |
|
|
model.eval() |
|
|
|
|
|
# Example |
|
|
text = "Apple stock surges after strong earnings report." |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128) |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
pred = torch.argmax(logits, dim=1).item() |
|
|
|
|
|
label_map = {0: 'negative', 1: 'neutral', 2: 'positive'} |
|
|
print(f"Predicted sentiment: {label_map[pred]}") |
|
|
``` |
|
|
|
|
|
## Notes |
|
|
- The model was trained and evaluated on data from the Kaggle dataset linked above. |
|
|
- The `roberta_finance_sentiment/` folder contains only the files needed for inference. |
|
|
- Scripts and datasets are not included in the model folder or in the model upload. |
|
|
- For best results, use a GPU for inference if available. |
|
|
|
|
|
## Limitations |
|
|
- Model is trained on headline-level sentiment. |
|
|
- Sarcasm, irony, or complex phrasing may reduce prediction accuracy. |
|
|
--- |
|
|
|
|
|
**Date:** June 2025 |