Add paper information and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +52 -37
README.md CHANGED
@@ -1,48 +1,63 @@
1
  ---
2
- tags:
3
- - financial NLP
4
- - named entity recognition
5
- - sequence labeling
6
  datasets:
7
- - AAU-NLP/hifi-kpi-lite
8
- model_name: "Lite-BERT-SL"
9
- library_name: "transformers"
10
- pipeline_tag: "token-classification"
11
- base_model: "bert-base-uncased"
 
 
 
 
 
 
 
12
  task_categories:
13
- - token-classification
14
  task_ids:
15
- - named-entity-recognition
16
- pretty_name: "Lite-BERT-SL: Sequence Labeling for HiFi-KPI Lite"
17
- size_categories: "10K<n<100K"
18
- language:
19
- - en
20
- dataset_name: "HiFi-KPI Lite"
21
- model_description: |
22
- Lite-BERT-SL is a **BERT-based sequence labeling model** fine-tuned on **HiFi-KPI Lite**, a manually curated subset of the
23
- **HiFi-KPI dataset**. This dataset contains a smaller, expert-chosen set of **financial key performance indicators (KPIs)**.
24
-
25
- Unlike the full HiFi-KPI dataset, HiFi-KPI Lite focuses on **four expert-mapped KPI clusters** (e.g., revenue, earnings,
26
- EPS, EBIT).
27
-
28
- dataset_link: "https://huggingface.co/datasets/AAU-NLP/hifi-kpi-lite"
29
- repo_link: "https://github.com/rasmus393/HiFi-KPI"
30
  ---
31
 
32
- ## **Lite-BERT-SL**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- ### **Model Description**
35
- Lite-BERT-SL is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI Lite dataset](https://huggingface.co/datasets/AAU-NLP/hifi-kpi-lite)**,
36
- which is a manually curated version of **HiFi-KPI** with four general KPI categories.
 
37
 
38
- ### **Use Cases**
39
- - Identifying **generalized KPIs** from SEC **10-K and 10-Q** reports
40
- - **Financial document parsing** with entity recognition
41
 
42
- ### **Performance**
43
- - Trained on **HiFi-KPI Lite**, which includes a **manually curated subset** of financial KPIs For performance table see [HiFi-KPI Lite](https://huggingface.co/datasets/AAU-NLP/hifi-kpi-lite)
 
 
44
 
 
 
45
 
46
- ### **Dataset & Code**
47
- - **Dataset**: [HiFi-KPI Lite on Hugging Face](https://huggingface.co/datasets/AAU-NLP/hifi-kpi-lite)
48
- - **Code example**: [HiFi-KPI GitHub Repository](https://github.com/rasmus393/HiFi-KPI)
 
 
 
 
 
 
1
  ---
2
+ base_model: bert-base-uncased
 
 
 
3
  datasets:
4
+ - AAU-NLP/hifi-kpi-lite
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ model_name: Lite-BERT-SL
9
+ pipeline_tag: token-classification
10
+ tags:
11
+ - financial-nlp
12
+ - named-entity-recognition
13
+ - sequence-labeling
14
+ - finance
15
+ license: cc-by-4.0
16
  task_categories:
17
+ - token-classification
18
  task_ids:
19
+ - named-entity-recognition
20
+ pretty_name: 'Lite-BERT-SL: Sequence Labeling for HiFi-KPI Lite'
21
+ size_categories: 10K<n<100K
 
 
 
 
 
 
 
 
 
 
 
 
22
  ---
23
 
24
+ # Lite-BERT-SL: Sequence Labeling for HiFi-KPI Lite
25
+
26
+ Lite-BERT-SL is a **BERT-based sequence labeling model** fine-tuned on the **HiFi-KPI Lite** dataset. This model was introduced in the paper [HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings](https://huggingface.co/papers/2502.15411).
27
+
28
+ ## Model Description
29
+ The model is designed for the hierarchical extraction of Key Performance Indicators (KPIs) from financial earnings filings (SEC 10-K and 10-Q reports). While the full HiFi-KPI dataset contains a massive taxonomy of iXBRL tags, Lite-BERT-SL is fine-tuned on a manually curated subset focusing on four expert-mapped KPI clusters:
30
+ - **Revenues**
31
+ - **Earnings**
32
+ - **EPS** (Earnings Per Share)
33
+ - **EBIT** (Earnings Before Interest and Taxes)
34
+
35
+ - **Developed by:** Rasmus Aavang, Giovanni Rizzi, Rasmus Bøggild, Alexandre Iolov, Mike Zhang, Johannes Bjerva
36
+ - **Model type:** Token Classification (Sequence Labeling)
37
+ - **Base Model:** `bert-base-uncased`
38
+ - **Language:** English
39
 
40
+ ## Use Cases
41
+ - Identifying and extracting generalized financial KPIs from earnings filings.
42
+ - Automating the parsing of SEC 10-K and 10-Q reports for structured data extraction.
43
+ - Assisting in the alignment of financial text with iXBRL taxonomies.
44
 
45
+ ## Performance
46
+ According to the paper, encoder-based models achieve over 0.906 macro-F1 on the HiFi-KPI Lite classification task. For detailed performance metrics, please refer to the [paper](https://huggingface.co/papers/2502.15411) and the [HiFi-KPI Lite dataset page](https://huggingface.co/datasets/AAU-NLP/hifi-kpi-lite).
 
47
 
48
+ ## Dataset & Code
49
+ - **Paper**: [HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings](https://huggingface.co/papers/2502.15411)
50
+ - **Dataset**: [HiFi-KPI Lite on Hugging Face](https://huggingface.co/datasets/AAU-NLP/hifi-kpi-lite)
51
+ - **Code**: [Official HiFi-KPI GitHub Repository](https://github.com/aaunlp/HiFi-KPI)
52
 
53
+ ## Citation
54
+ If you use this model or the dataset in your research, please cite:
55
 
56
+ ```bibtex
57
+ @article{aavang2025hifikpi,
58
+ title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
59
+ author={Aavang, Rasmus and Rizzi, Giovanni and B{\o}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
60
+ journal={arXiv preprint arXiv:2502.15411},
61
+ year={2025}
62
+ }
63
+ ```