--- license: apache-2.0 language: - en library_name: transformers tags: - finance - aspect-classification - absa - finbert - text-classification datasets: - pauri32/fiqa-2018 base_model: ProsusAI/finbert metrics: - accuracy - f1 pipeline_tag: text-classification --- # ABSA-FinBERT: Aspect Classification for Financial Text This model classifies financial headlines and tweets into four aspect categories: **Corporate**, **Economy**, **Market**, and **Stock**. ## Model Description ABSA-FinBERT is a fine-tuned version of [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert) for Level-1 aspect classification on the FiQA dataset. The model was trained with class-weighted cross-entropy loss to address extreme class imbalance in the training data. This work is motivated by [Yang et al. (2018)](https://arxiv.org/abs/1808.07931), "Financial Aspect-Based Sentiment Analysis using Deep Representations," which demonstrated that financial text often contains multi-dimensional information requiring aspect-level analysis. ## Intended Use - Classifying financial news headlines by topic/aspect - Preprocessing step for aspect-based sentiment analysis pipelines - Financial text categorization ## Training Data Trained on the [FiQA dataset](https://huggingface.co/datasets/pauri32/fiqa-2018) (WWW'18 Open Challenge), with Level-1 aspect labels extracted from hierarchical annotations. | Aspect | Training Examples | Percentage | |--------|-------------------|------------| | Stock | 562 | 58.5% | | Corporate | 367 | 38.2% | | Market | 26 | 2.7% | | Economy | 4 | 0.4% | ### Class Weights Applied Due to extreme imbalance, inverse frequency weights were used: Corporate (0.65), Economy (59.94), Market (9.22), Stock (0.43). ## Performance | Metric | Score | |--------|-------| | Accuracy | 88.59% | | Macro-F1 | 0.5429 | | Weighted-F1 | 0.8688 | ### Per-Class Results | Aspect | Precision | Recall | F1-Score | Support | |--------|-----------|--------|----------|---------| | Corporate | 0.91 | 0.94 | 0.92 | 64 | | Economy | 0.00 | 0.00 | 0.00 | 3 | | Market | 0.50 | 0.25 | 0.33 | 8 | | Stock | 0.89 | 0.95 | 0.92 | 74 | **Note:** The model performs well on majority classes but fails on Economy due to having only 4 training examples. Class weighting cannot overcome severe data scarcity. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("your-username/absa-finbert") model = AutoModelForSequenceClassification.from_pretrained("your-username/absa-finbert") # Label mapping id2label = {0: "Corporate", 1: "Economy", 2: "Market", 3: "Stock"} # Example inference text = "How Kraft-Heinz Merger Came Together in Speedy 10 Weeks" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=-1).item() print(f"Aspect: {id2label[prediction]}") # Output: Corporate ``` ## Training Procedure - **Base model:** ProsusAI/finbert - **Learning rate:** 3e-5 - **Batch size:** 16 (effective 32 with gradient accumulation) - **Epochs:** 10 (early stopping patience: 3) - **Loss:** Weighted cross-entropy - **Optimizer:** AdamW with warmup (10%) - **Mixed precision:** FP16 ## Limitations - Economy class is effectively unlearnable with only 4 training examples - Market class has limited representation (26 examples) - Model is optimized for short financial headlines/tweets, not long-form text ## Citation If you use this model, please cite: ```bibtex @misc{absa-finbert-2025, title={ABSA-FinBERT: Aspect Classification for Financial Text}, author={Cirillo, Nick and Memon, Suha and Truong, Kalen and Zhang, Bruce}, year={2025}, howpublished={\url{https://huggingface.co/your-username/absa-finbert}} } ``` ## References - Yang, S., Rosenfeld, J., & Makutonin, J. (2018). Financial Aspect-Based Sentiment Analysis using Deep Representations. arXiv:1808.07931. - Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063. - Maia, M., et al. (2018). WWW'18 Open Challenge: Financial Opinion Mining and Question Answering.