AAU-NLP/Pre-BERT-SL1000 · Hugging Face

Pre-BERT-SL1000

This model was presented in the paper HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings.

Model Description

Pre-BERT-SL1000 is a BERT-based sequence labeling model fine-tuned on the HiFi-KPI dataset for extracting financial key performance indicators (KPIs) from SEC earnings filings (10-K & 10-Q). It specializes in identifying entities that are one level up the presentation taxonomy, such as revenueAbstract, earnings, and financial ratios, using token classification.

This model is trained specifically on n=1 with the presentation taxonomy labels from HiFi-KPI, focusing on entity identification.

Use Cases

Extracting financial KPIs using iXBRL presentation taxonomy
Financial document parsing with entity recognition

Performance

Trained on 1,000 most frequent labels from the HiFi-KPI dataset with n=1 in the presentation taxonomy.

Resources

Paper: HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Dataset: HiFi-KPI on Hugging Face
Code: HiFi-KPI GitHub Repository

Citation

If you use this model or dataset, please cite:

@article{aavang2025hifikpi,
  title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
  author={Aavang, Rasmus and Rizzi, Giovanni and B{\o}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
  journal={arXiv preprint arXiv:2502.15411},
  year={2025}
}

Downloads last month: 23

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for AAU-NLP/Pre-BERT-SL1000

Base model

google-bert/bert-base-uncased

Finetuned

(6672)

this model

Dataset used to train AAU-NLP/Pre-BERT-SL1000

Paper for AAU-NLP/Pre-BERT-SL1000

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

Paper • 2502.15411 • Published Feb 21, 2025 • 2