byt5-base-khm-en

This model is a fine-tuned version of google/byt5-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 2
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 5

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: Cosine
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
0.8053	1.0	1333	0.9467	3.0623	34.1781
0.6026	2.0	2666	0.7041	12.6402	26.6596
0.4719	3.0	3999	0.5933	17.5698	24.2143
0.4029	4.0	5332	0.5593	22.9685	24.0115
0.3404	5.0	6665	0.5602	26.8820	23.5353
0.5988	6.0	7998	0.5456	17.1649	78.2938
0.4947	7.0	9331	0.4823	20.9237	81.6542
0.4402	8.0	10664	0.4455	25.0764	75.5452

The evaluation is done on the Tatoeba and Asian Language Treebank (ALT) dataset.

Safetensors

Model size

0.6B params

Tensor type

F32

Base model

Finetuned

(52)

this model