sft_verilog

This model is a fine-tuned version of Qwen/Qwen3-8B on the origen-sft-r dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3458

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 5
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 40
  • total_eval_batch_size: 5
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
0.5307 0.0582 200 0.4737
0.5037 0.1164 400 0.4350
0.4749 0.1746 600 0.4194
0.4564 0.2327 800 0.4103
0.4574 0.2909 1000 0.4042
0.456 0.3491 1200 0.3973
0.4455 0.4073 1400 0.3914
0.4445 0.4655 1600 0.3864
0.4286 0.5237 1800 0.3835
0.4416 0.5819 2000 0.3804
0.4378 0.6400 2200 0.3775
0.4257 0.6982 2400 0.3743
0.4053 0.7564 2600 0.3723
0.4297 0.8146 2800 0.3696
0.4313 0.8728 3000 0.3674
0.4263 0.9310 3200 0.3660
0.415 0.9892 3400 0.3639
0.3695 1.0474 3600 0.3659
0.3607 1.1056 3800 0.3631
0.3752 1.1638 4000 0.3620
0.3779 1.2220 4200 0.3609
0.3876 1.2802 4400 0.3596
0.3762 1.3384 4600 0.3582
0.3762 1.3965 4800 0.3571
0.3702 1.4547 5000 0.3556
0.3612 1.5129 5200 0.3547
0.3678 1.5711 5400 0.3528
0.3653 1.6293 5600 0.3515
0.3655 1.6875 5800 0.3508
0.3686 1.7457 6000 0.3499
0.349 1.8038 6200 0.3485
0.354 1.8620 6400 0.3476
0.3622 1.9202 6600 0.3465
0.3727 1.9784 6800 0.3458
0.3188 2.0364 7000 0.3511
0.3225 2.0946 7200 0.3521
0.3302 2.1527 7400 0.3510
0.3235 2.2109 7600 0.3505
0.3186 2.2691 7800 0.3499
0.3125 2.3273 8000 0.3501
0.3175 2.3855 8200 0.3492
0.3205 2.4437 8400 0.3489
0.3169 2.5019 8600 0.3485
0.3225 2.5600 8800 0.3481
0.307 2.6182 9000 0.3481
0.3156 2.6764 9200 0.3477
0.318 2.7346 9400 0.3474
0.3185 2.7928 9600 0.3473
0.3119 2.8510 9800 0.3473
0.3174 2.9092 10000 0.3473
0.3107 2.9673 10200 0.3473

Framework versions

  • Transformers 4.51.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for henryen/origen-r-sft

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(903)
this model