Token Classification
Transformers
Safetensors
English
bert
financial NLP
named entity recognition
sequence labeling
structured extraction
hierarchical taxonomy
XBRL
iXBRL
SEC filings
financial-information-extraction
Instructions to use AAU-NLP/Pre-BERT-SL1000 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AAU-NLP/Pre-BERT-SL1000 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="AAU-NLP/Pre-BERT-SL1000")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("AAU-NLP/Pre-BERT-SL1000") model = AutoModelForTokenClassification.from_pretrained("AAU-NLP/Pre-BERT-SL1000") - Notebooks
- Google Colab
- Kaggle
Add paper link and improve model card metadata
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,57 +1,52 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
- en
|
| 4 |
-
tags:
|
| 5 |
-
- financial NLP
|
| 6 |
-
- named entity recognition
|
| 7 |
-
- sequence labeling
|
| 8 |
-
- structured extraction
|
| 9 |
-
- hierarchical taxonomy
|
| 10 |
-
- XBRL
|
| 11 |
-
- iXBRL
|
| 12 |
-
- SEC filings
|
| 13 |
-
- financial-information-extraction
|
| 14 |
datasets:
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
Pre-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **HiFi-KPI dataset** for extracting
|
| 32 |
-
**financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying
|
| 33 |
-
entities that are one level up the **presentation taxonomy**, such as revenueAbstract, earnings, and financial ratios, using **token classification**.
|
| 34 |
-
|
| 35 |
-
This model is trained specifically on n=1 with the **presentation taxonomy labels** from **HiFi-KPI**, focusing on entity identification.
|
| 36 |
-
|
| 37 |
-
dataset_link: "https://huggingface.co/datasets/AAU-NLP/HiFi-KPI"
|
| 38 |
-
repo_link: "https://github.com/rasmus393/HiFi-KPI"
|
| 39 |
---
|
| 40 |
|
| 41 |
## **Pre-BERT-SL1000**
|
| 42 |
|
|
|
|
|
|
|
| 43 |
### **Model Description**
|
| 44 |
-
Pre-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities, such as
|
| 45 |
|
| 46 |
-
This model is trained on
|
| 47 |
|
| 48 |
### **Use Cases**
|
| 49 |
- Extracting **financial KPIs** using **iXBRL presentation taxonomy**
|
| 50 |
- **Financial document parsing** with entity recognition
|
| 51 |
|
| 52 |
### **Performance**
|
| 53 |
-
- Trained on **1,000 most frequent labels** from the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** with n=1 in the **presentation taxonomy**
|
| 54 |
|
| 55 |
-
### **
|
|
|
|
| 56 |
- **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
|
| 57 |
-
- **Code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: bert-base-uncased
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
datasets:
|
| 4 |
+
- AAU-NLP/HiFi-KPI
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: token-classification
|
| 9 |
+
tags:
|
| 10 |
+
- financial NLP
|
| 11 |
+
- named entity recognition
|
| 12 |
+
- sequence labeling
|
| 13 |
+
- structured extraction
|
| 14 |
+
- hierarchical taxonomy
|
| 15 |
+
- XBRL
|
| 16 |
+
- iXBRL
|
| 17 |
+
- SEC filings
|
| 18 |
+
- financial-information-extraction
|
| 19 |
+
license: cc-by-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
---
|
| 21 |
|
| 22 |
## **Pre-BERT-SL1000**
|
| 23 |
|
| 24 |
+
This model was presented in the paper [HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings](https://huggingface.co/papers/2502.15411).
|
| 25 |
+
|
| 26 |
### **Model Description**
|
| 27 |
+
Pre-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities that are one level up the **presentation taxonomy**, such as revenueAbstract, earnings, and financial ratios, using **token classification**.
|
| 28 |
|
| 29 |
+
This model is trained specifically on n=1 with the **presentation taxonomy labels** from **HiFi-KPI**, focusing on entity identification.
|
| 30 |
|
| 31 |
### **Use Cases**
|
| 32 |
- Extracting **financial KPIs** using **iXBRL presentation taxonomy**
|
| 33 |
- **Financial document parsing** with entity recognition
|
| 34 |
|
| 35 |
### **Performance**
|
| 36 |
+
- Trained on **1,000 most frequent labels** from the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** with n=1 in the **presentation taxonomy**.
|
| 37 |
|
| 38 |
+
### **Resources**
|
| 39 |
+
- **Paper**: [HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings](https://huggingface.co/papers/2502.15411)
|
| 40 |
- **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
|
| 41 |
+
- **Code**: [HiFi-KPI GitHub Repository](https://github.com/aaunlp/HiFi-KPI)
|
| 42 |
+
|
| 43 |
+
### **Citation**
|
| 44 |
+
If you use this model or dataset, please cite:
|
| 45 |
+
```bibtex
|
| 46 |
+
@article{aavang2025hifikpi,
|
| 47 |
+
title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
|
| 48 |
+
author={Aavang, Rasmus and Rizzi, Giovanni and B{\o}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
|
| 49 |
+
journal={arXiv preprint arXiv:2502.15411},
|
| 50 |
+
year={2025}
|
| 51 |
+
}
|
| 52 |
+
```
|