TabPFN-V4FinBench

Built with PriorLabs-TabPFN.

This repository contains six fine-tuned TabPFN checkpoints released to support reproducibility of the experiments in:

V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction

Each checkpoint corresponds to one V4FinBench prediction horizon:

Checkpoint Prediction horizon Task
tabpfn_v4finbench_h0 0 years current-year financial distress
tabpfn_v4finbench_h1 1 year ahead distress prediction one year before the event
tabpfn_v4finbench_h2 2 years ahead distress prediction two years before the event
tabpfn_v4finbench_h3 3 years ahead distress prediction three years before the event
tabpfn_v4finbench_h4 4 years ahead distress prediction four years before the event
tabpfn_v4finbench_h5 5 years ahead distress prediction five years before the event

The checkpoints are intended to let researchers reproduce the benchmark results without re-running TabPFN fine-tuning.

Model description

TabPFN-V4FinBench is a collection of six fine-tuned TabPFN checkpoints for tabular binary classification. The task is corporate financial distress prediction from structured financial and non-financial company-year features.

Each checkpoint was fine-tuned separately on one of the six V4FinBench horizon-specific tasks.

The models were fine-tuned on V4FinBench, a benchmark of over one million company-year observations from the Visegrád Group economies:

  • Poland
  • Hungary
  • Czech Republic
  • Slovakia

The benchmark covers years 2006–2021 and contains 131 financial and non-financial features. Labels are derived from a composite financial distress criterion based on solvency, profitability, and liquidity deterioration.

Intended use

These checkpoints are released for research, evaluation, and reproducibility.

The main intended use is to reproduce selected TabPFN results from the V4FinBench paper without having to fine-tune TabPFN again.

Typical uses include:

  • reproducing V4FinBench benchmark results;
  • evaluating the released checkpoints on the V4FinBench test folds;
  • comparing new tabular models against the fine-tuned TabPFN baselines;
  • studying transfer to related corporate distress or bankruptcy prediction datasets.

Out-of-scope use

These models are not intended for production credit scoring, lending decisions, investment decisions, regulatory decisions, or automated decision-making about real companies.

The models should not be used as the sole basis for financial, legal, or business decisions.

Dataset

The models were fine-tuned on V4FinBench, a corporate distress benchmark containing 1,106,879 company-year observations from 203,900 companies across the V4 economies.

The benchmark includes six prediction horizons:

Horizon Total instances Positive cases Negative cases
0 years 1,000,087 3,587 996,500
1 year 996,500 3,054 993,446
2 years 898,692 2,374 896,318
3 years 793,234 1,896 791,338
4 years 700,041 1,485 698,556
5 years 598,832 1,154 597,678

Dataset and code:

Distress definition

A company is labeled as financially distressed if, in its final available annual report, it simultaneously satisfies all three criteria:

  1. Solvency: equity / total assets < 0
  2. Profitability: EBITDA / total assets < 0
  3. Liquidity: current assets / current liabilities < 0.6

This label captures financial distress rather than formal legal bankruptcy. The criterion is designed to identify companies with simultaneous deterioration in solvency, profitability, and liquidity.

Multi-horizon setup

V4FinBench provides six derived binary classification tasks for horizons h = 0, 1, 2, 3, 4, 5.

For each horizon h, distressed companies have their final h years of data removed, and the resulting final observation receives a positive label. Other company-year observations are assigned a negative label.

Each model in this repository was fine-tuned on one horizon-specific task.

Fine-tuning procedure

The models were initialized from a pretrained TabPFN checkpoint and fine-tuned separately for each prediction horizon using the same imbalance-aware context construction strategy.

Because V4FinBench is severely imbalanced, with only about 0.19–0.36% positive cases depending on the horizon, uniformly sampled TabPFN contexts contain very few positive examples. To address this, fine-tuning uses prototype undersampling:

  1. All minority-class examples are retained.
  2. Majority-class examples are clustered with MiniBatchKMeans.
  3. One real majority example closest to each cluster centroid is selected.
  4. The resulting context uses an approximately 7:3 majority-to-minority ratio.

This preserves minority-class signal while keeping a representative structure of the non-distressed majority population.

Training configuration

Hyperparameter Value
Learning rate 5e-6
Epochs 10
Batch size 1024
Meta batch size 1
Inference context size 10,000
Loss Cross entropy
Hardware Single NVIDIA A100 GPU

A checkpoint was saved after each epoch. For each horizon, the final checkpoint was selected using validation F1 after threshold calibration on the precision-recall curve.

License

These models are released under the Prior Labs License v1.1, May 2025.

The full license text is included in the LICENSE file.

Built with PriorLabs-TabPFN.

These checkpoints are fine-tuned derivatives of TabPFN. They were modified by the V4FinBench authors and are not official Prior Labs releases. They are not endorsed, approved, or validated by Prior Labs.

Citation

If you use these models, please cite the V4FinBench paper:

TBA

Please also cite:

@article{hollmann2025tabpfn,
  title={TabPFN: A Tabular Foundation Model},
  author={Hollmann, Noah and others},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support