End of training

24e46ef verified 6 months ago

2.76 kB

library_name: transformers
tags:
  - generated_from_trainer
datasets:
  - arrow
model-index:
  - name: dense_isl_100m_mult
    results: []

dense_isl_100m_mult

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 25299
training_steps: 252992
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
6.8389	0.3953	10000	6.7793
5.4301	0.7905	20000	5.4051
4.9618	1.1858	30000	4.9770
4.7451	1.5811	40000	4.7354
4.5748	1.9763	50000	4.5824
4.3215	2.3716	60000	4.4992
4.3104	2.7669	70000	4.4252
3.9934	3.1622	80000	4.3939
4.0323	3.5574	90000	4.3534
4.0436	3.9527	100000	4.3117
3.7358	4.3480	110000	4.3456
3.7637	4.7433	120000	4.3139
3.4001	5.1385	130000	4.3703
3.485	5.5338	140000	4.3637
3.5164	5.9291	150000	4.3389
3.179	6.3244	160000	4.4476
3.2661	6.7196	170000	4.4353
2.8494	7.1149	180000	4.5269
2.9443	7.5102	190000	4.5534
2.9868	7.9054	200000	4.5482
2.6395	8.3007	210000	4.6645
2.6918	8.6960	220000	4.6763
2.3959	9.0913	230000	4.7391
2.4431	9.4865	240000	4.7673
2.4452	9.8818	250000	4.7704