We used GPT4.1 to extract a binary macro-vascular-disease label from more than 7000 Dutch CT Heart reports in the University Medical Center Utrecht.

In parallel we developed a heuristics pipeline based on CADRADS>2, MESA>65%, CACS>400 and a stenosis/occlusion>=50%, together with regular expressions regarding the macro-vascular disease level.

This model is trained on the labels extracted with GPT-4.1, these labels were consistent with the heuristics-based labeling pipeline, with a correlation score of ahout about 0.83(Pearson).

In 10-fold cross-validation the model scored f1,precision and recall of about 93%, the model uploaded here was 1 fold from a 40-fold split, and obtained 90% (f1,prec,rec). Note: this scoring was obtained with the default 0.5 proba threshold.

Downloads last month: 6

Safetensors

Model size

0.1B params

Tensor type

BF16

Model tree for UMCU/Cardio_MacroCVD_CTReport_Dutch

Base model

CLTL/MedRoBERTa.nl

Finetuned

UMCU/CardioBERTa.nl_clinical

Finetuned

(8)

this model