You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

dense_eng_100m_mult_het_retok-het

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 723
training_steps: 7235
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
8.419	0.6908	500	7.7970
6.2543	1.3813	1000	6.0385
5.6668	2.0718	1500	5.4910
5.2471	2.7627	2000	5.2379
5.0071	3.4532	2500	5.0902
4.834	4.1437	3000	4.9959
4.7398	4.8345	3500	4.9203
4.5632	5.5250	4000	4.8889
4.5025	6.2155	4500	4.8730
4.404	6.9064	5000	4.8471
4.2675	7.5969	5500	4.8560
4.1516	8.2874	6000	4.8683
4.1668	8.9782	6500	4.8611
4.0815	9.6687	7000	4.8746

Safetensors

Model size

0.2B params

Tensor type

F32