dense_isl_100m_mult
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.7708
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 25299
- training_steps: 252992
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 6.8389 | 0.3953 | 10000 | 6.7793 |
| 5.4301 | 0.7905 | 20000 | 5.4051 |
| 4.9618 | 1.1858 | 30000 | 4.9770 |
| 4.7451 | 1.5811 | 40000 | 4.7354 |
| 4.5748 | 1.9763 | 50000 | 4.5824 |
| 4.3215 | 2.3716 | 60000 | 4.4992 |
| 4.3104 | 2.7669 | 70000 | 4.4252 |
| 3.9934 | 3.1622 | 80000 | 4.3939 |
| 4.0323 | 3.5574 | 90000 | 4.3534 |
| 4.0436 | 3.9527 | 100000 | 4.3117 |
| 3.7358 | 4.3480 | 110000 | 4.3456 |
| 3.7637 | 4.7433 | 120000 | 4.3139 |
| 3.4001 | 5.1385 | 130000 | 4.3703 |
| 3.485 | 5.5338 | 140000 | 4.3637 |
| 3.5164 | 5.9291 | 150000 | 4.3389 |
| 3.179 | 6.3244 | 160000 | 4.4476 |
| 3.2661 | 6.7196 | 170000 | 4.4353 |
| 2.8494 | 7.1149 | 180000 | 4.5269 |
| 2.9443 | 7.5102 | 190000 | 4.5534 |
| 2.9868 | 7.9054 | 200000 | 4.5482 |
| 2.6395 | 8.3007 | 210000 | 4.6645 |
| 2.6918 | 8.6960 | 220000 | 4.6763 |
| 2.3959 | 9.0913 | 230000 | 4.7391 |
| 2.4431 | 9.4865 | 240000 | 4.7673 |
| 2.4452 | 9.8818 | 250000 | 4.7704 |
Framework versions
- Transformers 4.51.0
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -