webagent / web_rl_3b /README.md
Hizy's picture
Upload folder using huggingface_hub
8a80acf verified
metadata
library_name: transformers
license: other
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: web_policy_sft
    results: []

web_policy_sft

This model is a fine-tuned version of /data/hzy/models--Qwen--Qwen2.5-3B-Instruct/snapshots/aa8e72537993ba99e69dfaafa59ed015b17504d1 on the web_policy_sft dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1625

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • total_eval_batch_size: 2
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
0.5121 0.0470 50 0.4876
0.4615 0.0940 100 0.3849
0.37 0.1409 150 0.3281
0.3749 0.1879 200 0.2892
0.2863 0.2349 250 0.2757
0.3078 0.2819 300 0.2549
0.2921 0.3289 350 0.2316
0.3191 0.3759 400 0.2353
0.313 0.4228 450 0.2231
0.2037 0.4698 500 0.2138
0.1729 0.5168 550 0.2074
0.289 0.5638 600 0.1954
0.2775 0.6108 650 0.1897
0.1546 0.6577 700 0.1814
0.1613 0.7047 750 0.1746
0.0956 0.7517 800 0.1725
0.1692 0.7987 850 0.1683
0.1885 0.8457 900 0.1653
0.2799 0.8926 950 0.1637
0.1971 0.9396 1000 0.1628
0.1464 0.9866 1050 0.1626

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.7.1+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1