e3-sft / README.md

3N3G

End of training

41e7b08 verified 5 months ago

preview code

raw

history blame contribute delete

2.7 kB

metadata

library_name: transformers
license: mit
base_model: CMU-AIRe/e3-1.7B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: e3-sft
    results: []

e3-sft

This model is a fine-tuned version of CMU-AIRe/e3-1.7B on the hardmath_sft_2 dataset. It achieves the following results on the evaluation set:

Loss: 0.6364

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_warmup_ratio: 0.1
num_epochs: 100.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.7025	4.0	16	0.7606
0.9105	8.0	32	0.7590
0.8193	12.0	48	0.7550
0.6939	16.0	64	0.7460
0.6623	20.0	80	0.7418
0.8112	24.0	96	0.7389
0.708	28.0	112	0.7154
0.6471	32.0	128	0.7097
0.9019	36.0	144	0.7050
0.7328	40.0	160	0.7007
0.8191	44.0	176	0.6938
0.6327	48.0	192	0.6752
0.6903	52.0	208	0.6604
0.7467	56.0	224	0.6533
0.7364	60.0	240	0.6489
0.7706	64.0	256	0.6460
0.7777	68.0	272	0.6441
0.6391	72.0	288	0.6419
0.648	76.0	304	0.6408
0.704	80.0	320	0.6398
0.6316	84.0	336	0.6387
0.6232	88.0	352	0.6380
0.6545	92.0	368	0.6372
0.7126	96.0	384	0.6364
0.6465	100.0	400	0.6364

Framework versions

Transformers 4.55.0
Pytorch 2.5.1
Datasets 3.6.0
Tokenizers 0.21.1