本项工作在同元软控实习期间完成,旨在通过微调得到更适配 Julia 语言的大模型。

sft

This model is a fine-tuned version of Qwen/Qwen3-8B on the all_julia_snippets_format dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6342

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
0.7947 0.1113 2000 0.7638
0.7582 0.2226 4000 0.7252
0.7321 0.3339 6000 0.7043
0.7156 0.4452 8000 0.6903
0.7136 0.5565 10000 0.6801
0.6989 0.6678 12000 0.6719
0.6944 0.7791 14000 0.6651
0.6901 0.8904 16000 0.6598
0.6779 1.0017 18000 0.6556
0.6439 1.1130 20000 0.6538
0.645 1.2243 22000 0.6504
0.6387 1.3356 24000 0.6478
0.6005 1.4469 26000 0.6468
0.5976 1.5582 28000 0.6485
0.5934 1.6696 30000 0.6541
0.5978 1.7809 32000 0.6574
0.5965 1.8922 34000 0.6550
0.5961 2.0035 36000 0.6563
0.592 2.1148 38000 0.6541
0.5897 2.2261 40000 0.6513
0.5949 2.3374 42000 0.6498
0.585 2.4487 44000 0.6457
0.5913 2.5600 46000 0.6446
0.5835 2.6713 48000 0.6416
0.5827 2.7826 50000 0.6395
0.5901 2.8939 52000 0.6364

Framework versions

  • PEFT 0.17.1
  • Transformers 4.56.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CorgiPudding/Qwen3-8B-Julia

Finetuned
Qwen/Qwen3-8B
Adapter
(1461)
this model