DeBERT ClimateCheck - checkpoint 26000 (job 2463191)

Best checkpoint by eval accuracy from job 2463191.

Task

3-way NLI / claim verification with labels: supports, refutes, nei.

Base model

MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli

Datasets

  • fever/fever (v1.0)
  • tals/vitaminc
  • Dzeniks/hover-3way
  • pminervini/averitec
  • rabuahmad/climatecheck

Splits

  • Combined train/validation are concatenated across datasets.
  • ClimateCheck uses a stratified 90/10 split from train with seed 1234 (official test split ignored).

Processed datasets (sizes)

Dataset Train Validation
fever 94,616 11,882
vitaminc 370,653 63,054
hover 17,155 2,144
averitec 2,872 462
climatecheck 2,722 301
Total 488,018 77,843

Label distribution (combined)

Split supports refutes nei
Train 262,189 165,798 60,031
Validation 38,207 29,733 9,903
Climate eval 139 45 117

Training configuration

  • num_train_epochs: 3.0
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • gradient_accumulation_steps: 1
  • learning_rate: 2e-06
  • warmup_steps: 200
  • weight_decay: 0.01
  • lr_scheduler_type: linear
  • max_length: 320
  • eval_steps: 2000
  • save_steps: 2000
  • seed: 1234

Hardware

  • gpu: 1x NVIDIA H100 (slurm partition H100)
  • cpus: 6
  • memory: 64G

Eval metrics (combined validation)

Last eval at step 26000 (epoch 1.7048062422136252).

  • epoch: 1.7048062422136252
  • eval_accuracy: 0.9208278200994309
  • eval_f1_micro: 0.9208278200994309
  • eval_loss: 0.2498868852853775
  • eval_macro_f1: 0.8856652666528996
  • eval_macro_precision: 0.8993871009204719
  • eval_macro_recall: 0.8746118748467642
  • eval_nei_f1: 0.782261113811916
  • eval_nei_precision: 0.8409988385598142
  • eval_nei_recall: 0.7311925679087146
  • eval_refutes_f1: 0.9199900091582716
  • eval_refutes_precision: 0.9110546797704637
  • eval_refutes_recall: 0.9291023441966838
  • eval_runtime: 198.7966
  • eval_samples_per_second: 391.571
  • eval_steps_per_second: 48.95
  • eval_supports_f1: 0.9547446769885111
  • eval_supports_precision: 0.9461077844311377
  • eval_supports_recall: 0.9635407124348941
  • step: 26000
Downloads last month
33
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rausch/deberta-climatecheck-2463191-step26000

Datasets used to train rausch/deberta-climatecheck-2463191-step26000

Collection including rausch/deberta-climatecheck-2463191-step26000