finbert-fiqa-aspect / README.md
nick-cirillo's picture
push model
5b58d01 verified
metadata
license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - finance
  - aspect-classification
  - absa
  - finbert
  - text-classification
datasets:
  - pauri32/fiqa-2018
base_model: ProsusAI/finbert
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification

ABSA-FinBERT: Aspect Classification for Financial Text

This model classifies financial headlines and tweets into four aspect categories: Corporate, Economy, Market, and Stock.

Model Description

ABSA-FinBERT is a fine-tuned version of ProsusAI/finbert for Level-1 aspect classification on the FiQA dataset. The model was trained with class-weighted cross-entropy loss to address extreme class imbalance in the training data.

This work is motivated by Yang et al. (2018), "Financial Aspect-Based Sentiment Analysis using Deep Representations," which demonstrated that financial text often contains multi-dimensional information requiring aspect-level analysis.

Intended Use

  • Classifying financial news headlines by topic/aspect
  • Preprocessing step for aspect-based sentiment analysis pipelines
  • Financial text categorization

Training Data

Trained on the FiQA dataset (WWW'18 Open Challenge), with Level-1 aspect labels extracted from hierarchical annotations.

Aspect Training Examples Percentage
Stock 562 58.5%
Corporate 367 38.2%
Market 26 2.7%
Economy 4 0.4%

Class Weights Applied

Due to extreme imbalance, inverse frequency weights were used: Corporate (0.65), Economy (59.94), Market (9.22), Stock (0.43).

Performance

Metric Score
Accuracy 88.59%
Macro-F1 0.5429
Weighted-F1 0.8688

Per-Class Results

Aspect Precision Recall F1-Score Support
Corporate 0.91 0.94 0.92 64
Economy 0.00 0.00 0.00 3
Market 0.50 0.25 0.33 8
Stock 0.89 0.95 0.92 74

Note: The model performs well on majority classes but fails on Economy due to having only 4 training examples. Class weighting cannot overcome severe data scarcity.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("your-username/absa-finbert")
model = AutoModelForSequenceClassification.from_pretrained("your-username/absa-finbert")

# Label mapping
id2label = {0: "Corporate", 1: "Economy", 2: "Market", 3: "Stock"}

# Example inference
text = "How Kraft-Heinz Merger Came Together in Speedy 10 Weeks"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()

print(f"Aspect: {id2label[prediction]}")  # Output: Corporate

Training Procedure

  • Base model: ProsusAI/finbert
  • Learning rate: 3e-5
  • Batch size: 16 (effective 32 with gradient accumulation)
  • Epochs: 10 (early stopping patience: 3)
  • Loss: Weighted cross-entropy
  • Optimizer: AdamW with warmup (10%)
  • Mixed precision: FP16

Limitations

  • Economy class is effectively unlearnable with only 4 training examples
  • Market class has limited representation (26 examples)
  • Model is optimized for short financial headlines/tweets, not long-form text

Citation

If you use this model, please cite:

@misc{absa-finbert-2025,
  title={ABSA-FinBERT: Aspect Classification for Financial Text},
  author={Cirillo, Nick and Memon, Suha and Truong, Kalen and Zhang, Bruce},
  year={2025},
  howpublished={\url{https://huggingface.co/your-username/absa-finbert}}
}

References

  • Yang, S., Rosenfeld, J., & Makutonin, J. (2018). Financial Aspect-Based Sentiment Analysis using Deep Representations. arXiv:1808.07931.
  • Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
  • Maia, M., et al. (2018). WWW'18 Open Challenge: Financial Opinion Mining and Question Answering.