Token Classification
Transformers
Safetensors
English
bert
financial NLP
named entity recognition
sequence labeling
structured extraction
hierarchical taxonomy
XBRL
iXBRL
SEC filings
financial-information-extraction
Instructions to use AAU-NLP/Cal-BERT-SL1000 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AAU-NLP/Cal-BERT-SL1000 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="AAU-NLP/Cal-BERT-SL1000")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("AAU-NLP/Cal-BERT-SL1000") model = AutoModelForTokenClassification.from_pretrained("AAU-NLP/Cal-BERT-SL1000") - Notebooks
- Google Colab
- Kaggle
Add paper link and citation, update official repository
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,56 +1,54 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
- en
|
| 4 |
-
tags:
|
| 5 |
-
- financial NLP
|
| 6 |
-
- named entity recognition
|
| 7 |
-
- sequence labeling
|
| 8 |
-
- structured extraction
|
| 9 |
-
- hierarchical taxonomy
|
| 10 |
-
- XBRL
|
| 11 |
-
- iXBRL
|
| 12 |
-
- SEC filings
|
| 13 |
-
- financial-information-extraction
|
| 14 |
datasets:
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
task_categories:
|
| 21 |
-
|
| 22 |
task_ids:
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
pretty_name:
|
| 26 |
-
size_categories: "1M<n<10M"
|
| 27 |
-
languages:
|
| 28 |
-
- en
|
| 29 |
-
dataset_name: "HiFi-KPI"
|
| 30 |
-
model_description: |
|
| 31 |
-
Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **HiFi-KPI dataset** for extracting
|
| 32 |
-
**financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying
|
| 33 |
-
entities that are one level up the calculation taxonomy, such as revenueAbstract, earnings, and financial ratios, using **token classification**.
|
| 34 |
-
|
| 35 |
-
This model is trained specifically on n=1 with the **calculation taxonomy labels** from **HiFi-KPI**, focusing on structured extraction.
|
| 36 |
-
|
| 37 |
-
dataset_link: "https://huggingface.co/datasets/AAU-NLP/HiFi-KPI"
|
| 38 |
-
repo_link: "https://github.com/rasmus393/HiFi-KPI"
|
| 39 |
---
|
| 40 |
|
| 41 |
## **Cal-BERT-SL1000**
|
| 42 |
|
| 43 |
### **Model Description**
|
| 44 |
-
Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities, such as
|
| 45 |
-
|
|
|
|
| 46 |
|
| 47 |
### **Use Cases**
|
| 48 |
- Extracting **financial KPIs** using **iXBRL calculation taxonomy**
|
| 49 |
- **Financial document parsing** with entity recognition
|
| 50 |
|
| 51 |
### **Performance**
|
| 52 |
-
- Trained on **1,000 most frequent labels** from the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** with n=1 in the calculation taxonomy
|
| 53 |
|
| 54 |
### **Dataset & Code**
|
| 55 |
- **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
|
| 56 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: bert-base-uncased
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
datasets:
|
| 4 |
+
- AAU-NLP/HiFi-KPI
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
library_name: transformers
|
| 8 |
+
model_name: Cal-BERT-SL1000
|
| 9 |
+
pipeline_tag: token-classification
|
| 10 |
+
tags:
|
| 11 |
+
- financial NLP
|
| 12 |
+
- named entity recognition
|
| 13 |
+
- sequence labeling
|
| 14 |
+
- structured extraction
|
| 15 |
+
- hierarchical taxonomy
|
| 16 |
+
- XBRL
|
| 17 |
+
- iXBRL
|
| 18 |
+
- SEC filings
|
| 19 |
+
- financial-information-extraction
|
| 20 |
task_categories:
|
| 21 |
+
- token-classification
|
| 22 |
task_ids:
|
| 23 |
+
- named-entity-recognition
|
| 24 |
+
- financial-information-extraction
|
| 25 |
+
pretty_name: 'Cal-BERT-SL1000: Sequence Labeling for Calculation Taxonomy KPI Extraction'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
---
|
| 27 |
|
| 28 |
## **Cal-BERT-SL1000**
|
| 29 |
|
| 30 |
### **Model Description**
|
| 31 |
+
Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities that are one level up the calculation taxonomy ($n=1$), such as `revenueAbstract`, `earnings`, and `financial ratios`, using **token classification**.
|
| 32 |
+
|
| 33 |
+
This model was introduced in the paper [HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings](https://huggingface.co/papers/2502.15411) by Rasmus Aavang, Giovanni Rizzi, Rasmus Bøggild, Alexandre Iolov, Mike Zhang (@jjzha), and Johannes Bjerva.
|
| 34 |
|
| 35 |
### **Use Cases**
|
| 36 |
- Extracting **financial KPIs** using **iXBRL calculation taxonomy**
|
| 37 |
- **Financial document parsing** with entity recognition
|
| 38 |
|
| 39 |
### **Performance**
|
| 40 |
+
- Trained on **1,000 most frequent labels** from the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** with $n=1$ in the calculation taxonomy.
|
| 41 |
|
| 42 |
### **Dataset & Code**
|
| 43 |
- **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
|
| 44 |
+
- **Official Code**: [HiFi-KPI GitHub Repository](https://github.com/aaunlp/HiFi-KPI)
|
| 45 |
+
|
| 46 |
+
### **Citation**
|
| 47 |
+
```bibtex
|
| 48 |
+
@article{aavang2025hifikpi,
|
| 49 |
+
title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
|
| 50 |
+
author={Aavang, Rasmus and Rizzi, Giovanni and B{\o}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
|
| 51 |
+
journal={arXiv preprint arXiv:2502.15411},
|
| 52 |
+
year={2025}
|
| 53 |
+
}
|
| 54 |
+
```
|