dense_hom_100m
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.5102
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 66788
- training_steps: 667880
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 8.4525 | 0.1497 | 10000 | 8.4304 |
| 7.3336 | 0.2995 | 20000 | 7.3022 |
| 6.4132 | 0.4492 | 30000 | 6.3862 |
| 5.8988 | 0.5989 | 40000 | 5.8711 |
| 5.6228 | 0.7486 | 50000 | 5.6064 |
| 5.4744 | 0.8984 | 60000 | 5.4458 |
| 5.2808 | 1.0481 | 70000 | 5.3160 |
| 5.1593 | 1.1978 | 80000 | 5.1888 |
| 5.1095 | 1.3475 | 90000 | 5.0874 |
| 5.0067 | 1.4973 | 100000 | 5.0072 |
| 4.9448 | 1.6470 | 110000 | 4.9405 |
| 4.8901 | 1.7967 | 120000 | 4.8872 |
| 4.8371 | 1.9464 | 130000 | 4.8377 |
| 4.6843 | 2.0962 | 140000 | 4.8066 |
| 4.6858 | 2.2459 | 150000 | 4.7772 |
| 4.654 | 2.3956 | 160000 | 4.7471 |
| 4.6345 | 2.5453 | 170000 | 4.7199 |
| 4.6339 | 2.6951 | 180000 | 4.6928 |
| 4.6157 | 2.8448 | 190000 | 4.6695 |
| 4.5953 | 2.9945 | 200000 | 4.6452 |
| 4.433 | 3.1442 | 210000 | 4.6433 |
| 4.4471 | 3.2940 | 220000 | 4.6301 |
| 4.4507 | 3.4437 | 230000 | 4.6134 |
| 4.462 | 3.5934 | 240000 | 4.5953 |
| 4.4476 | 3.7431 | 250000 | 4.5798 |
| 4.4127 | 3.8929 | 260000 | 4.5641 |
| 4.221 | 4.0426 | 270000 | 4.5716 |
| 4.264 | 4.1923 | 280000 | 4.5673 |
| 4.2815 | 4.3420 | 290000 | 4.5543 |
| 4.2952 | 4.4918 | 300000 | 4.5408 |
| 4.3095 | 4.6415 | 310000 | 4.5279 |
| 4.3148 | 4.7912 | 320000 | 4.5176 |
| 4.3125 | 4.9409 | 330000 | 4.5053 |
| 4.09 | 5.0907 | 340000 | 4.5283 |
| 4.1335 | 5.2405 | 350000 | 4.5244 |
| 4.1502 | 5.3902 | 360000 | 4.5136 |
| 4.1655 | 5.5399 | 370000 | 4.5057 |
| 4.1605 | 5.6896 | 380000 | 4.4929 |
| 4.177 | 5.8394 | 390000 | 4.4838 |
| 4.1474 | 5.9891 | 400000 | 4.4757 |
| 3.9881 | 6.1388 | 410000 | 4.5119 |
| 4.0034 | 6.2886 | 420000 | 4.5069 |
| 4.0274 | 6.4383 | 430000 | 4.4966 |
| 4.0535 | 6.5880 | 440000 | 4.4878 |
| 4.0514 | 6.7377 | 450000 | 4.4785 |
| 4.0476 | 6.8875 | 460000 | 4.4674 |
| 3.8266 | 7.0372 | 470000 | 4.5037 |
| 3.8644 | 7.1869 | 480000 | 4.5106 |
| 3.9039 | 7.3366 | 490000 | 4.5029 |
| 3.9142 | 7.4864 | 500000 | 4.4955 |
| 3.9112 | 7.6361 | 510000 | 4.4856 |
| 3.9333 | 7.7858 | 520000 | 4.4762 |
| 3.9188 | 7.9355 | 530000 | 4.4689 |
| 3.7217 | 8.0853 | 540000 | 4.5152 |
| 3.7674 | 8.2350 | 550000 | 4.5160 |
| 3.7844 | 8.3847 | 560000 | 4.5106 |
| 3.7862 | 8.5345 | 570000 | 4.5055 |
| 3.7891 | 8.6842 | 580000 | 4.4996 |
| 3.7912 | 8.8339 | 590000 | 4.4929 |
| 3.7521 | 8.9836 | 600000 | 4.4885 |
| 3.6301 | 9.1334 | 610000 | 4.5250 |
| 3.6341 | 9.2831 | 620000 | 4.5243 |
| 3.6515 | 9.4328 | 630000 | 4.5208 |
| 3.6546 | 9.5826 | 640000 | 4.5171 |
| 3.6662 | 9.7323 | 650000 | 4.5132 |
| 3.6615 | 9.8820 | 660000 | 4.5115 |
Framework versions
- Transformers 4.51.0
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -