adapter

This model is a fine-tuned version of Qwen/Qwen2.5-3B on the aqua_rat_multiple_choice and the aqua_rat_open_form datasets. It achieves the following results on the evaluation set:

  • Loss: 0.4348

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
0.5984 0.0821 100 0.6201
0.5397 0.1642 200 0.5328
0.4751 0.2463 300 0.5067
0.4854 0.3284 400 0.4936
0.5098 0.4105 500 0.4845
0.473 0.4926 600 0.4806
0.4599 0.5747 700 0.4754
0.4888 0.6568 800 0.4699
0.4579 0.7389 900 0.4668
0.4381 0.8210 1000 0.4628
0.4479 0.9031 1100 0.4603
0.4579 0.9852 1200 0.4579
0.4237 1.0673 1300 0.4566
0.4168 1.1494 1400 0.4564
0.4394 1.2315 1500 0.4530
0.4438 1.3136 1600 0.4513
0.4352 1.3957 1700 0.4481
0.4355 1.4778 1800 0.4467
0.4057 1.5599 1900 0.4443
0.4743 1.6420 2000 0.4429
0.4264 1.7241 2100 0.4413
0.4071 1.8062 2200 0.4400
0.4156 1.8883 2300 0.4385
0.3961 1.9704 2400 0.4375
0.3887 2.0525 2500 0.4391
0.4492 2.1346 2600 0.4383
0.4032 2.2167 2700 0.4381
0.381 2.2989 2800 0.4372
0.4104 2.3810 2900 0.4366
0.3933 2.4631 3000 0.4360
0.4282 2.5452 3100 0.4359
0.3708 2.6273 3200 0.4356
0.3629 2.7094 3300 0.4349
0.4044 2.7915 3400 0.4349
0.3941 2.8736 3500 0.4348
0.4078 2.9557 3600 0.4348

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.4.0
  • Datasets 2.21.0
  • Tokenizers 0.20.3
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tonyshelby/qwen2.5_3b_checkpoints

Base model

Qwen/Qwen2.5-3B
Adapter
(413)
this model

Collection including tonyshelby/qwen2.5_3b_checkpoints