finbert-fiqa-aspect / README.md

nick-cirillo

push model

5b58d01 verified 21 days ago

preview code

raw

history blame contribute delete

4.31 kB

metadata

license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - finance
  - aspect-classification
  - absa
  - finbert
  - text-classification
datasets:
  - pauri32/fiqa-2018
base_model: ProsusAI/finbert
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification

ABSA-FinBERT: Aspect Classification for Financial Text

This model classifies financial headlines and tweets into four aspect categories: Corporate, Economy, Market, and Stock.

Model Description

ABSA-FinBERT is a fine-tuned version of ProsusAI/finbert for Level-1 aspect classification on the FiQA dataset. The model was trained with class-weighted cross-entropy loss to address extreme class imbalance in the training data.

This work is motivated by Yang et al. (2018), "Financial Aspect-Based Sentiment Analysis using Deep Representations," which demonstrated that financial text often contains multi-dimensional information requiring aspect-level analysis.

Intended Use

Classifying financial news headlines by topic/aspect
Preprocessing step for aspect-based sentiment analysis pipelines
Financial text categorization

Training Data

Trained on the FiQA dataset (WWW'18 Open Challenge), with Level-1 aspect labels extracted from hierarchical annotations.

Aspect	Training Examples	Percentage
Stock	562	58.5%
Corporate	367	38.2%
Market	26	2.7%
Economy	4	0.4%

Class Weights Applied

Due to extreme imbalance, inverse frequency weights were used: Corporate (0.65), Economy (59.94), Market (9.22), Stock (0.43).

Performance

Metric	Score
Accuracy	88.59%
Macro-F1	0.5429
Weighted-F1	0.8688

Per-Class Results

Aspect	Precision	Recall	F1-Score	Support
Corporate	0.91	0.94	0.92	64
Economy	0.00	0.00	0.00	3
Market	0.50	0.25	0.33	8
Stock	0.89	0.95	0.92	74

Note: The model performs well on majority classes but fails on Economy due to having only 4 training examples. Class weighting cannot overcome severe data scarcity.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("your-username/absa-finbert")
model = AutoModelForSequenceClassification.from_pretrained("your-username/absa-finbert")

# Label mapping
id2label = {0: "Corporate", 1: "Economy", 2: "Market", 3: "Stock"}

# Example inference
text = "How Kraft-Heinz Merger Came Together in Speedy 10 Weeks"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()

print(f"Aspect: {id2label[prediction]}")  # Output: Corporate

Training Procedure

Base model: ProsusAI/finbert
Learning rate: 3e-5
Batch size: 16 (effective 32 with gradient accumulation)
Epochs: 10 (early stopping patience: 3)
Loss: Weighted cross-entropy
Optimizer: AdamW with warmup (10%)
Mixed precision: FP16

Limitations

Economy class is effectively unlearnable with only 4 training examples
Market class has limited representation (26 examples)
Model is optimized for short financial headlines/tweets, not long-form text

Citation

If you use this model, please cite:

@misc{absa-finbert-2025,
  title={ABSA-FinBERT: Aspect Classification for Financial Text},
  author={Cirillo, Nick and Memon, Suha and Truong, Kalen and Zhang, Bruce},
  year={2025},
  howpublished={\url{https://huggingface.co/your-username/absa-finbert}}
}

References

Yang, S., Rosenfeld, J., & Makutonin, J. (2018). Financial Aspect-Based Sentiment Analysis using Deep Representations. arXiv:1808.07931.
Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063.
Maia, M., et al. (2018). WWW'18 Open Challenge: Financial Opinion Mining and Question Answering.