You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

dense_hom_100m

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.5102

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 66788
  • training_steps: 667880
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
8.4525 0.1497 10000 8.4304
7.3336 0.2995 20000 7.3022
6.4132 0.4492 30000 6.3862
5.8988 0.5989 40000 5.8711
5.6228 0.7486 50000 5.6064
5.4744 0.8984 60000 5.4458
5.2808 1.0481 70000 5.3160
5.1593 1.1978 80000 5.1888
5.1095 1.3475 90000 5.0874
5.0067 1.4973 100000 5.0072
4.9448 1.6470 110000 4.9405
4.8901 1.7967 120000 4.8872
4.8371 1.9464 130000 4.8377
4.6843 2.0962 140000 4.8066
4.6858 2.2459 150000 4.7772
4.654 2.3956 160000 4.7471
4.6345 2.5453 170000 4.7199
4.6339 2.6951 180000 4.6928
4.6157 2.8448 190000 4.6695
4.5953 2.9945 200000 4.6452
4.433 3.1442 210000 4.6433
4.4471 3.2940 220000 4.6301
4.4507 3.4437 230000 4.6134
4.462 3.5934 240000 4.5953
4.4476 3.7431 250000 4.5798
4.4127 3.8929 260000 4.5641
4.221 4.0426 270000 4.5716
4.264 4.1923 280000 4.5673
4.2815 4.3420 290000 4.5543
4.2952 4.4918 300000 4.5408
4.3095 4.6415 310000 4.5279
4.3148 4.7912 320000 4.5176
4.3125 4.9409 330000 4.5053
4.09 5.0907 340000 4.5283
4.1335 5.2405 350000 4.5244
4.1502 5.3902 360000 4.5136
4.1655 5.5399 370000 4.5057
4.1605 5.6896 380000 4.4929
4.177 5.8394 390000 4.4838
4.1474 5.9891 400000 4.4757
3.9881 6.1388 410000 4.5119
4.0034 6.2886 420000 4.5069
4.0274 6.4383 430000 4.4966
4.0535 6.5880 440000 4.4878
4.0514 6.7377 450000 4.4785
4.0476 6.8875 460000 4.4674
3.8266 7.0372 470000 4.5037
3.8644 7.1869 480000 4.5106
3.9039 7.3366 490000 4.5029
3.9142 7.4864 500000 4.4955
3.9112 7.6361 510000 4.4856
3.9333 7.7858 520000 4.4762
3.9188 7.9355 530000 4.4689
3.7217 8.0853 540000 4.5152
3.7674 8.2350 550000 4.5160
3.7844 8.3847 560000 4.5106
3.7862 8.5345 570000 4.5055
3.7891 8.6842 580000 4.4996
3.7912 8.8339 590000 4.4929
3.7521 8.9836 600000 4.4885
3.6301 9.1334 610000 4.5250
3.6341 9.2831 620000 4.5243
3.6515 9.4328 630000 4.5208
3.6546 9.5826 640000 4.5171
3.6662 9.7323 650000 4.5132
3.6615 9.8820 660000 4.5115

Framework versions

  • Transformers 4.51.0
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results