transformer_multi_head_bert_updated

A multi-head transformer regression model based on BERT (GroNLP/bert-base-dutch-cased), fine-tuned to predict four normalized delta scores for Dutch book reviews. The four output heads are:

  1. delta_cola_to_final
  2. delta_perplexity_to_final_large
  3. iter_to_final_simplified
  4. robbert_delta_blurb_to_final

โš ๏ธ The order of these outputs is crucial and must be maintained exactly as above during inference.
Changing the order will cause incorrect mapping of predicted values to their respective targets.

Additionally, a final aggregate score is provided (mean of the four heads).

๐Ÿ“ˆ Training & Evaluation

  • Base model: GroNLP/bert-base-dutch-cased
  • Fine-tuning: 5 epochs on a proprietary dataset
  • Output heads: 4
  • Problem type: multi-head regression

Per-Epoch Validation Metrics

Epoch Val Loss ฮ”CoLA RMSE / Rยฒ ฮ”Perp RMSE / Rยฒ Iter RMSE / Rยฒ Blurb RMSE / Rยฒ Mean RMSE
1 0.01636 0.1498 / 0.3689 0.0999 / 0.6485 0.1385 / 0.8184 0.1178 / 0.7295 0.1265
2 0.01522 0.1466 / 0.3950 0.1019 / 0.6347 0.1272 / 0.8467 0.1132 / 0.7499 0.1222
3 0.01521 0.1470 / 0.3922 0.0986 / 0.6579 0.1278 / 0.8453 0.1148 / 0.7429 0.1220
4 0.01516 0.1429 / 0.4250 0.0999 / 0.6485 0.1284 / 0.8438 0.1171 / 0.7324 0.1221
5 0.01546 0.1447 / 0.4107 0.1002 / 0.6465 0.1311 / 0.8373 0.1169 / 0.7333 0.1232

โœ… Final Aggregate Performance (Test)

Metric Value
Aggregate RMSE 0.0769
Aggregate Rยฒ 0.8425
Mean RMSE (heads) 0.1210

๐Ÿ—‚๏ธ Test Metrics (Per Target)

Target RMSE Rยฒ
delta_cola_to_final 0.1463 0.4286
delta_perplexity_to_final_large 0.0955 0.6802
iter_to_final_simplified 0.1255 0.8535
robbert_delta_blurb_to_final 0.1168 0.7319

๐Ÿท๏ธ Notes

  • Base model: GroNLP/bert-base-dutch-cased
  • Fine-tuned for multi-head regression on Dutch book reviews
  • Trained for 5 epochs on a proprietary dataset
  • Sigmoid activation built into each head
  • Re-aggregation: simple average of the four head outputs

๐Ÿ› ๏ธ Training Arguments

  • num_train_epochs=5
  • per_device_train_batch_size=8
  • per_device_eval_batch_size=16
  • gradient_accumulation_steps=2
  • learning_rate=2e-5
  • weight_decay=0.01
  • eval_strategy="epoch"
  • save_strategy="epoch"
  • load_best_model_at_end=True
  • metric_for_best_model="mean_rmse"
  • greater_is_better=False
  • bf16 enabled if supported, else fp16 enabled
  • logging_strategy="epoch"
  • push_to_hub=True with model ID Felixbrk/bert-base-dutch-cased-multi-score-tuned-positive
  • hub_strategy="end"
  • Early stopping with patience 2 epochs

โš ๏ธ Important:

  • Always load this model with trust_remote_code=True as it uses a custom multi-head regression architecture.
  • Maintain the output order exactly for correct interpretation of results.
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Felixbrk/bert-base-dutch-cased-multi-score-tuned-positive

Finetuned
(20)
this model