sft_verilog
This model is a fine-tuned version of Qwen/Qwen3-8B on the origen-sft-r dataset. It achieves the following results on the evaluation set:
- Loss: 0.3458
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 5
- gradient_accumulation_steps: 8
- total_train_batch_size: 40
- total_eval_batch_size: 5
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.5307 | 0.0582 | 200 | 0.4737 |
| 0.5037 | 0.1164 | 400 | 0.4350 |
| 0.4749 | 0.1746 | 600 | 0.4194 |
| 0.4564 | 0.2327 | 800 | 0.4103 |
| 0.4574 | 0.2909 | 1000 | 0.4042 |
| 0.456 | 0.3491 | 1200 | 0.3973 |
| 0.4455 | 0.4073 | 1400 | 0.3914 |
| 0.4445 | 0.4655 | 1600 | 0.3864 |
| 0.4286 | 0.5237 | 1800 | 0.3835 |
| 0.4416 | 0.5819 | 2000 | 0.3804 |
| 0.4378 | 0.6400 | 2200 | 0.3775 |
| 0.4257 | 0.6982 | 2400 | 0.3743 |
| 0.4053 | 0.7564 | 2600 | 0.3723 |
| 0.4297 | 0.8146 | 2800 | 0.3696 |
| 0.4313 | 0.8728 | 3000 | 0.3674 |
| 0.4263 | 0.9310 | 3200 | 0.3660 |
| 0.415 | 0.9892 | 3400 | 0.3639 |
| 0.3695 | 1.0474 | 3600 | 0.3659 |
| 0.3607 | 1.1056 | 3800 | 0.3631 |
| 0.3752 | 1.1638 | 4000 | 0.3620 |
| 0.3779 | 1.2220 | 4200 | 0.3609 |
| 0.3876 | 1.2802 | 4400 | 0.3596 |
| 0.3762 | 1.3384 | 4600 | 0.3582 |
| 0.3762 | 1.3965 | 4800 | 0.3571 |
| 0.3702 | 1.4547 | 5000 | 0.3556 |
| 0.3612 | 1.5129 | 5200 | 0.3547 |
| 0.3678 | 1.5711 | 5400 | 0.3528 |
| 0.3653 | 1.6293 | 5600 | 0.3515 |
| 0.3655 | 1.6875 | 5800 | 0.3508 |
| 0.3686 | 1.7457 | 6000 | 0.3499 |
| 0.349 | 1.8038 | 6200 | 0.3485 |
| 0.354 | 1.8620 | 6400 | 0.3476 |
| 0.3622 | 1.9202 | 6600 | 0.3465 |
| 0.3727 | 1.9784 | 6800 | 0.3458 |
| 0.3188 | 2.0364 | 7000 | 0.3511 |
| 0.3225 | 2.0946 | 7200 | 0.3521 |
| 0.3302 | 2.1527 | 7400 | 0.3510 |
| 0.3235 | 2.2109 | 7600 | 0.3505 |
| 0.3186 | 2.2691 | 7800 | 0.3499 |
| 0.3125 | 2.3273 | 8000 | 0.3501 |
| 0.3175 | 2.3855 | 8200 | 0.3492 |
| 0.3205 | 2.4437 | 8400 | 0.3489 |
| 0.3169 | 2.5019 | 8600 | 0.3485 |
| 0.3225 | 2.5600 | 8800 | 0.3481 |
| 0.307 | 2.6182 | 9000 | 0.3481 |
| 0.3156 | 2.6764 | 9200 | 0.3477 |
| 0.318 | 2.7346 | 9400 | 0.3474 |
| 0.3185 | 2.7928 | 9600 | 0.3473 |
| 0.3119 | 2.8510 | 9800 | 0.3473 |
| 0.3174 | 2.9092 | 10000 | 0.3473 |
| 0.3107 | 2.9673 | 10200 | 0.3473 |
Framework versions
- Transformers 4.51.1
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 2