Pre-BERT-SL1000

This model was presented in the paper HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings.

Model Description

Pre-BERT-SL1000 is a BERT-based sequence labeling model fine-tuned on the HiFi-KPI dataset for extracting financial key performance indicators (KPIs) from SEC earnings filings (10-K & 10-Q). It specializes in identifying entities that are one level up the presentation taxonomy, such as revenueAbstract, earnings, and financial ratios, using token classification.

This model is trained specifically on n=1 with the presentation taxonomy labels from HiFi-KPI, focusing on entity identification.

Use Cases

  • Extracting financial KPIs using iXBRL presentation taxonomy
  • Financial document parsing with entity recognition

Performance

  • Trained on 1,000 most frequent labels from the HiFi-KPI dataset with n=1 in the presentation taxonomy.

Resources

Citation

If you use this model or dataset, please cite:

@article{aavang2025hifikpi,
  title={HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings},
  author={Aavang, Rasmus and Rizzi, Giovanni and B{\o}ggild, Rasmus and Iolov, Alexandre and Zhang, Mike and Bjerva, Johannes},
  journal={arXiv preprint arXiv:2502.15411},
  year={2025}
}
Downloads last month
23
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AAU-NLP/Pre-BERT-SL1000

Finetuned
(6672)
this model

Dataset used to train AAU-NLP/Pre-BERT-SL1000

Paper for AAU-NLP/Pre-BERT-SL1000