ABSA-FinBERT: Aspect Classification for Financial Text

This model classifies financial headlines and tweets into four aspect categories: Corporate, Economy, Market, and Stock.

Model Description

ABSA-FinBERT is a fine-tuned version of ProsusAI/finbert for Level-1 aspect classification on the FiQA dataset. The model was trained with class-weighted cross-entropy loss to address extreme class imbalance in the training data.

This work is motivated by Yang et al. (2018), "Financial Aspect-Based Sentiment Analysis using Deep Representations," which demonstrated that financial text often contains multi-dimensional information requiring aspect-level analysis.

Intended Use

  • Classifying financial news headlines by topic/aspect
  • Preprocessing step for aspect-based sentiment analysis pipelines
  • Financial text categorization

Training Data

Trained on the FiQA dataset (WWW'18 Open Challenge), with Level-1 aspect labels extracted from hierarchical annotations.

Aspect Training Examples Percentage
Stock 562 58.5%
Corporate 367 38.2%
Market 26 2.7%
Economy 4 0.4%

Class Weights Applied

Due to extreme imbalance, inverse frequency weights were used: Corporate (0.65), Economy (59.94), Market (9.22), Stock (0.43).

Performance

Metric Score
Accuracy 88.59%
Macro-F1 0.5429
Weighted-F1 0.8688

Per-Class Results

Aspect Precision Recall F1-Score Support
Corporate 0.91 0.94 0.92 64
Economy 0.00 0.00 0.00 3
Market 0.50 0.25 0.33 8
Stock 0.89 0.95 0.92 74

Note: The model performs well on majority classes but fails on Economy due to having only 4 training examples. Class weighting cannot overcome severe data scarcity.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("your-username/absa-finbert")
model = AutoModelForSequenceClassification.from_pretrained("your-username/absa-finbert")

# Label mapping
id2label = {0: "Corporate", 1: "Economy", 2: "Market", 3: "Stock"}

# Example inference
text = "How Kraft-Heinz Merger Came Together in Speedy 10 Weeks"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()

print(f"Aspect: {id2label[prediction]}")  # Output: Corporate

Training Procedure

  • Base model: ProsusAI/finbert
  • Learning rate: 3e-5
  • Batch size: 16 (effective 32 with gradient accumulation)
  • Epochs: 10 (early stopping patience: 3)
  • Loss: Weighted cross-entropy
  • Optimizer: AdamW with warmup (10%)
  • Mixed precision: FP16

Limitations

  • Economy class is effectively unlearnable with only 4 training examples
  • Market class has limited representation (26 examples)
  • Model is optimized for short financial headlines/tweets, not long-form text

Citation

If you use this model, please cite:

@misc{absa-finbert-2025,
  title={ABSA-FinBERT: Aspect Classification for Financial Text},
  author={Cirillo, Nick and Memon, Suha and Truong, Kalen and Zhang, Bruce},
  year={2025},
  howpublished={\url{https://huggingface.co/your-username/absa-finbert}}
}

References

  • Yang, S., Rosenfeld, J., & Makutonin, J. (2018). Financial Aspect-Based Sentiment Analysis using Deep Representations. arXiv:1808.07931.
  • Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
  • Maia, M., et al. (2018). WWW'18 Open Challenge: Financial Opinion Mining and Question Answering.
Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nick-cirillo/finbert-fiqa-aspect

Base model

ProsusAI/finbert
Finetuned
(83)
this model

Dataset used to train nick-cirillo/finbert-fiqa-aspect