Add paper link and citation, update official repository

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +36 -38
README.md CHANGED
@@ -1,56 +1,54 @@
1
  ---
2
- language:
3
- - en
4
- tags:
5
- - financial NLP
6
- - named entity recognition
7
- - sequence labeling
8
- - structured extraction
9
- - hierarchical taxonomy
10
- - XBRL
11
- - iXBRL
12
- - SEC filings
13
- - financial-information-extraction
14
  datasets:
15
- - AAU-NLP/HiFi-KPI
16
- model_name: "Cal-BERT-SL1000"
17
- library_name: "transformers"
18
- pipeline_tag: "token-classification"
19
- base_model: "bert-base-uncased"
 
 
 
 
 
 
 
 
 
 
 
20
  task_categories:
21
- - token-classification
22
  task_ids:
23
- - named-entity-recognition
24
- - financial-information-extraction
25
- pretty_name: "Cal-BERT-SL1000: Sequence Labeling for Calculation Taxonomy KPI Extraction"
26
- size_categories: "1M<n<10M"
27
- languages:
28
- - en
29
- dataset_name: "HiFi-KPI"
30
- model_description: |
31
- Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **HiFi-KPI dataset** for extracting
32
- **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying
33
- entities that are one level up the calculation taxonomy, such as revenueAbstract, earnings, and financial ratios, using **token classification**.
34
-
35
- This model is trained specifically on n=1 with the **calculation taxonomy labels** from **HiFi-KPI**, focusing on structured extraction.
36
-
37
- dataset_link: "https://huggingface.co/datasets/AAU-NLP/HiFi-KPI"
38
- repo_link: "https://github.com/rasmus393/HiFi-KPI"
39
  ---
40
 
41
  ## **Cal-BERT-SL1000**
42
 
43
  ### **Model Description**
44
- Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities, such as revenue, earnings, etc.
45
- This model is trained on the [HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI) and is focused on the calculation layer taxonomy with n=1
 
46
 
47
  ### **Use Cases**
48
  - Extracting **financial KPIs** using **iXBRL calculation taxonomy**
49
  - **Financial document parsing** with entity recognition
50
 
51
  ### **Performance**
52
- - Trained on **1,000 most frequent labels** from the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** with n=1 in the calculation taxonomy
53
 
54
  ### **Dataset & Code**
55
  - **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
56
- - **Code example**: [HiFi-KPI GitHub Repository](https://github.com/rasmus393/HiFi-KPI)
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: bert-base-uncased
 
 
 
 
 
 
 
 
 
 
 
3
  datasets:
4
+ - AAU-NLP/HiFi-KPI
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ model_name: Cal-BERT-SL1000
9
+ pipeline_tag: token-classification
10
+ tags:
11
+ - financial NLP
12
+ - named entity recognition
13
+ - sequence labeling
14
+ - structured extraction
15
+ - hierarchical taxonomy
16
+ - XBRL
17
+ - iXBRL
18
+ - SEC filings
19
+ - financial-information-extraction
20
  task_categories:
21
+ - token-classification
22
  task_ids:
23
+ - named-entity-recognition
24
+ - financial-information-extraction
25
+ pretty_name: 'Cal-BERT-SL1000: Sequence Labeling for Calculation Taxonomy KPI Extraction'
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ---
27
 
28
  ## **Cal-BERT-SL1000**
29
 
30
  ### **Model Description**
31
+ Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities that are one level up the calculation taxonomy ($n=1$), such as `revenueAbstract`, `earnings`, and `financial ratios`, using **token classification**.
32
+
33
+ This model was introduced in the paper [HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings](https://huggingface.co/papers/2502.15411) by Rasmus Aavang, Giovanni Rizzi, Rasmus Bøggild, Alexandre Iolov, Mike Zhang (@jjzha), and Johannes Bjerva.
34
 
35
  ### **Use Cases**
36
  - Extracting **financial KPIs** using **iXBRL calculation taxonomy**
37
  - **Financial document parsing** with entity recognition
38
 
39
  ### **Performance**
40
+ - Trained on **1,000 most frequent labels** from the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** with $n=1$ in the calculation taxonomy.
41
 
42
  ### **Dataset & Code**
43
  - **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
44
+ - **Official Code**: [HiFi-KPI GitHub Repository](https://github.com/aaunlp/HiFi-KPI)
45
+
46
+ ### **Citation**
47
+ ```bibtex
48
+ @article{aavang2025hifikpi,
49
+ title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
50
+ author={Aavang, Rasmus and Rizzi, Giovanni and B{\o}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
51
+ journal={arXiv preprint arXiv:2502.15411},
52
+ year={2025}
53
+ }
54
+ ```