Text Classification
Transformers
Safetensors
English
bert
token-classification
financial NLP
named entity recognition
sequence labeling
structured extraction
hierarchical taxonomy
XBRL
iXBRL
SEC filings
financial-information-extraction
text-embeddings-inference
Instructions to use AAU-NLP/BERT-SL1000 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AAU-NLP/BERT-SL1000 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="AAU-NLP/BERT-SL1000")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("AAU-NLP/BERT-SL1000") model = AutoModelForTokenClassification.from_pretrained("AAU-NLP/BERT-SL1000") - Notebooks
- Google Colab
- Kaggle
Link model to paper and official repository
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,41 +1,42 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
- en
|
| 4 |
-
tags:
|
| 5 |
-
- financial NLP
|
| 6 |
-
- named entity recognition
|
| 7 |
-
- sequence labeling
|
| 8 |
-
- structured extraction
|
| 9 |
-
- hierarchical taxonomy
|
| 10 |
-
- XBRL
|
| 11 |
-
- iXBRL
|
| 12 |
-
- SEC filings
|
| 13 |
-
- financial-information-extraction
|
| 14 |
datasets:
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
task_categories:
|
| 21 |
-
|
|
|
|
| 22 |
task_ids:
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
pretty_name:
|
| 26 |
-
size_categories:
|
| 27 |
languages:
|
| 28 |
-
|
| 29 |
-
dataset_name:
|
| 30 |
-
model_description:
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
repo_link: "https://github.com/rasmus393/HiFi-KPI"
|
| 39 |
---
|
| 40 |
|
| 41 |
## **BERT-SL1000**
|
|
@@ -43,7 +44,7 @@ repo_link: "https://github.com/rasmus393/HiFi-KPI"
|
|
| 43 |
### **Model Description**
|
| 44 |
BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities, such as revenue, earnings etc.
|
| 45 |
|
| 46 |
-
This model
|
| 47 |
|
| 48 |
### **Use Cases**
|
| 49 |
- Extracting **financial KPIs** from SEC **10-K and 10-Q** reports
|
|
@@ -54,4 +55,15 @@ This model is trained on the [HiFi-KPI dataset](https://huggingface.co/datasets/
|
|
| 54 |
|
| 55 |
### **Dataset & Code**
|
| 56 |
- **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
|
| 57 |
-
- **Code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: bert-base-uncased
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
datasets:
|
| 4 |
+
- AAU-NLP/HiFi-KPI
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
library_name: transformers
|
| 8 |
+
license: apache-2.0
|
| 9 |
+
model_name: BERT-SL1000
|
| 10 |
+
pipeline_tag: text-classification
|
| 11 |
+
tags:
|
| 12 |
+
- financial NLP
|
| 13 |
+
- named entity recognition
|
| 14 |
+
- sequence labeling
|
| 15 |
+
- structured extraction
|
| 16 |
+
- hierarchical taxonomy
|
| 17 |
+
- XBRL
|
| 18 |
+
- iXBRL
|
| 19 |
+
- SEC filings
|
| 20 |
+
- financial-information-extraction
|
| 21 |
task_categories:
|
| 22 |
+
- text-classification
|
| 23 |
+
- token-classification
|
| 24 |
task_ids:
|
| 25 |
+
- named-entity-recognition
|
| 26 |
+
- financial-information-extraction
|
| 27 |
+
pretty_name: 'BERT-SL1000: Sequence Labeling for Financial KPI Extraction'
|
| 28 |
+
size_categories: 1M<n<10M
|
| 29 |
languages:
|
| 30 |
+
- en
|
| 31 |
+
dataset_name: HiFi-KPI
|
| 32 |
+
model_description: "BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned\
|
| 33 |
+
\ on the **HiFi-KPI dataset** for extracting \n**financial key performance indicators\
|
| 34 |
+
\ (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying\
|
| 35 |
+
\ \nentities, such as revenue, earnings, and financial ratios, using **token classification**.\n\
|
| 36 |
+
\nThis model is part of the **HiFi-KPI benchmark** and is optimized for **hierarchical\
|
| 37 |
+
\ label consistency**.\n"
|
| 38 |
+
dataset_link: https://huggingface.co/datasets/AAU-NLP/HiFi-KPI
|
| 39 |
+
repo_link: https://github.com/aaunlp/HiFi-KPI
|
|
|
|
| 40 |
---
|
| 41 |
|
| 42 |
## **BERT-SL1000**
|
|
|
|
| 44 |
### **Model Description**
|
| 45 |
BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities, such as revenue, earnings etc.
|
| 46 |
|
| 47 |
+
This model was introduced in the paper [HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings](https://huggingface.co/papers/2502.15411) by Rasmus Aavang, Giovanni Rizzi, Rasmus Bøggild, Alexandre Iolov, Mike Zhang, and Johannes Bjerva.
|
| 48 |
|
| 49 |
### **Use Cases**
|
| 50 |
- Extracting **financial KPIs** from SEC **10-K and 10-Q** reports
|
|
|
|
| 55 |
|
| 56 |
### **Dataset & Code**
|
| 57 |
- **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
|
| 58 |
+
- **Code**: [HiFi-KPI GitHub Repository](https://github.com/aaunlp/HiFi-KPI)
|
| 59 |
+
|
| 60 |
+
### **Citation**
|
| 61 |
+
|
| 62 |
+
```bibtex
|
| 63 |
+
@article{aavang2025hifikpi,
|
| 64 |
+
title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
|
| 65 |
+
author={Aavang, Rasmus and Rizzi, Giovanni and B\o{}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
|
| 66 |
+
journal={arXiv preprint arXiv:2502.15411},
|
| 67 |
+
year={2025}
|
| 68 |
+
}
|
| 69 |
+
```
|