TabPFN-V4FinBench
Built with PriorLabs-TabPFN.
This repository contains six fine-tuned TabPFN checkpoints released to support reproducibility of the experiments in:
V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction
Each checkpoint corresponds to one V4FinBench prediction horizon:
| Checkpoint | Prediction horizon | Task |
|---|---|---|
tabpfn_v4finbench_h0 |
0 years | current-year financial distress |
tabpfn_v4finbench_h1 |
1 year ahead | distress prediction one year before the event |
tabpfn_v4finbench_h2 |
2 years ahead | distress prediction two years before the event |
tabpfn_v4finbench_h3 |
3 years ahead | distress prediction three years before the event |
tabpfn_v4finbench_h4 |
4 years ahead | distress prediction four years before the event |
tabpfn_v4finbench_h5 |
5 years ahead | distress prediction five years before the event |
The checkpoints are intended to let researchers reproduce the benchmark results without re-running TabPFN fine-tuning.
Model description
TabPFN-V4FinBench is a collection of six fine-tuned TabPFN checkpoints for tabular binary classification. The task is corporate financial distress prediction from structured financial and non-financial company-year features.
Each checkpoint was fine-tuned separately on one of the six V4FinBench horizon-specific tasks.
The models were fine-tuned on V4FinBench, a benchmark of over one million company-year observations from the Visegrád Group economies:
- Poland
- Hungary
- Czech Republic
- Slovakia
The benchmark covers years 2006–2021 and contains 131 financial and non-financial features. Labels are derived from a composite financial distress criterion based on solvency, profitability, and liquidity deterioration.
Intended use
These checkpoints are released for research, evaluation, and reproducibility.
The main intended use is to reproduce selected TabPFN results from the V4FinBench paper without having to fine-tune TabPFN again.
Typical uses include:
- reproducing V4FinBench benchmark results;
- evaluating the released checkpoints on the V4FinBench test folds;
- comparing new tabular models against the fine-tuned TabPFN baselines;
- studying transfer to related corporate distress or bankruptcy prediction datasets.
Out-of-scope use
These models are not intended for production credit scoring, lending decisions, investment decisions, regulatory decisions, or automated decision-making about real companies.
The models should not be used as the sole basis for financial, legal, or business decisions.
Dataset
The models were fine-tuned on V4FinBench, a corporate distress benchmark containing 1,106,879 company-year observations from 203,900 companies across the V4 economies.
The benchmark includes six prediction horizons:
| Horizon | Total instances | Positive cases | Negative cases |
|---|---|---|---|
| 0 years | 1,000,087 | 3,587 | 996,500 |
| 1 year | 996,500 | 3,054 | 993,446 |
| 2 years | 898,692 | 2,374 | 896,318 |
| 3 years | 793,234 | 1,896 | 791,338 |
| 4 years | 700,041 | 1,485 | 698,556 |
| 5 years | 598,832 | 1,154 | 597,678 |
Dataset and code:
- Kaggle: https://www.kaggle.com/datasets/sebastiantomczak10/v4-group-corporate-bankruptcy/data
- GitHub: https://github.com/genwro-ai/V4FinBench
Distress definition
A company is labeled as financially distressed if, in its final available annual report, it simultaneously satisfies all three criteria:
- Solvency: equity / total assets < 0
- Profitability: EBITDA / total assets < 0
- Liquidity: current assets / current liabilities < 0.6
This label captures financial distress rather than formal legal bankruptcy. The criterion is designed to identify companies with simultaneous deterioration in solvency, profitability, and liquidity.
Multi-horizon setup
V4FinBench provides six derived binary classification tasks for horizons h = 0, 1, 2, 3, 4, 5.
For each horizon h, distressed companies have their final h years of data removed, and the resulting final observation receives a positive label. Other company-year observations are assigned a negative label.
Each model in this repository was fine-tuned on one horizon-specific task.
Fine-tuning procedure
The models were initialized from a pretrained TabPFN checkpoint and fine-tuned separately for each prediction horizon using the same imbalance-aware context construction strategy.
Because V4FinBench is severely imbalanced, with only about 0.19–0.36% positive cases depending on the horizon, uniformly sampled TabPFN contexts contain very few positive examples. To address this, fine-tuning uses prototype undersampling:
- All minority-class examples are retained.
- Majority-class examples are clustered with MiniBatchKMeans.
- One real majority example closest to each cluster centroid is selected.
- The resulting context uses an approximately 7:3 majority-to-minority ratio.
This preserves minority-class signal while keeping a representative structure of the non-distressed majority population.
Training configuration
| Hyperparameter | Value |
|---|---|
| Learning rate | 5e-6 |
| Epochs | 10 |
| Batch size | 1024 |
| Meta batch size | 1 |
| Inference context size | 10,000 |
| Loss | Cross entropy |
| Hardware | Single NVIDIA A100 GPU |
A checkpoint was saved after each epoch. For each horizon, the final checkpoint was selected using validation F1 after threshold calibration on the precision-recall curve.
License
These models are released under the Prior Labs License v1.1, May 2025.
The full license text is included in the LICENSE file.
Built with PriorLabs-TabPFN.
These checkpoints are fine-tuned derivatives of TabPFN. They were modified by the V4FinBench authors and are not official Prior Labs releases. They are not endorsed, approved, or validated by Prior Labs.
Citation
If you use these models, please cite the V4FinBench paper:
TBA
Please also cite:
@article{hollmann2025tabpfn,
title={TabPFN: A Tabular Foundation Model},
author={Hollmann, Noah and others},
year={2025}
}