ABSA-FinBERT: Aspect Classification for Financial Text
This model classifies financial headlines and tweets into four aspect categories: Corporate, Economy, Market, and Stock.
Model Description
ABSA-FinBERT is a fine-tuned version of ProsusAI/finbert for Level-1 aspect classification on the FiQA dataset. The model was trained with class-weighted cross-entropy loss to address extreme class imbalance in the training data.
This work is motivated by Yang et al. (2018), "Financial Aspect-Based Sentiment Analysis using Deep Representations," which demonstrated that financial text often contains multi-dimensional information requiring aspect-level analysis.
Intended Use
- Classifying financial news headlines by topic/aspect
- Preprocessing step for aspect-based sentiment analysis pipelines
- Financial text categorization
Training Data
Trained on the FiQA dataset (WWW'18 Open Challenge), with Level-1 aspect labels extracted from hierarchical annotations.
| Aspect | Training Examples | Percentage |
|---|---|---|
| Stock | 562 | 58.5% |
| Corporate | 367 | 38.2% |
| Market | 26 | 2.7% |
| Economy | 4 | 0.4% |
Class Weights Applied
Due to extreme imbalance, inverse frequency weights were used: Corporate (0.65), Economy (59.94), Market (9.22), Stock (0.43).
Performance
| Metric | Score |
|---|---|
| Accuracy | 88.59% |
| Macro-F1 | 0.5429 |
| Weighted-F1 | 0.8688 |
Per-Class Results
| Aspect | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Corporate | 0.91 | 0.94 | 0.92 | 64 |
| Economy | 0.00 | 0.00 | 0.00 | 3 |
| Market | 0.50 | 0.25 | 0.33 | 8 |
| Stock | 0.89 | 0.95 | 0.92 | 74 |
Note: The model performs well on majority classes but fails on Economy due to having only 4 training examples. Class weighting cannot overcome severe data scarcity.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("your-username/absa-finbert")
model = AutoModelForSequenceClassification.from_pretrained("your-username/absa-finbert")
# Label mapping
id2label = {0: "Corporate", 1: "Economy", 2: "Market", 3: "Stock"}
# Example inference
text = "How Kraft-Heinz Merger Came Together in Speedy 10 Weeks"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()
print(f"Aspect: {id2label[prediction]}") # Output: Corporate
Training Procedure
- Base model: ProsusAI/finbert
- Learning rate: 3e-5
- Batch size: 16 (effective 32 with gradient accumulation)
- Epochs: 10 (early stopping patience: 3)
- Loss: Weighted cross-entropy
- Optimizer: AdamW with warmup (10%)
- Mixed precision: FP16
Limitations
- Economy class is effectively unlearnable with only 4 training examples
- Market class has limited representation (26 examples)
- Model is optimized for short financial headlines/tweets, not long-form text
Citation
If you use this model, please cite:
@misc{absa-finbert-2025,
title={ABSA-FinBERT: Aspect Classification for Financial Text},
author={Cirillo, Nick and Memon, Suha and Truong, Kalen and Zhang, Bruce},
year={2025},
howpublished={\url{https://huggingface.co/your-username/absa-finbert}}
}
References
- Yang, S., Rosenfeld, J., & Makutonin, J. (2018). Financial Aspect-Based Sentiment Analysis using Deep Representations. arXiv:1808.07931.
- Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
- Maia, M., et al. (2018). WWW'18 Open Challenge: Financial Opinion Mining and Question Answering.
- Downloads last month
- 26
Model tree for nick-cirillo/finbert-fiqa-aspect
Base model
ProsusAI/finbert