Qwen3 Technical Report
Paper β’ 2505.09388 β’ Published β’ 339
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This directory contains a Qwen3-0.6B model trained using SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning) methods, specifically optimized for the GSM8K mathematical reasoning task.
Qwen3-0.6B_sft+rl_merged_model/
βββ README.md # This file
βββ config.json # Model configuration file
βββ generation_config.json # Generation configuration file
βββ tokenizer_config.json # Tokenizer configuration
βββ tokenizer.json # Tokenizer file
βββ vocab.json # Vocabulary file
βββ merges.txt # BPE merges file
βββ gsm8k_test_outputs.jsonl # Test set output results
βββ gsm8k_train_outputs.jsonl # Training set output results
βββ evaluate_accuracy.py # Accuracy evaluation script
βββ collect_model_outputs.py # Model output collection script
βββ utils.py # Utility functions
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "./Qwen3-0.6B_sft+rl_merged_model"
model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True
)
Use the evaluation script in this directory:
python evaluate_accuracy.py \
--file gsm8k_test_outputs.jsonl \
--name "Qwen3-0.6B SFT+RL"
gsm8k_train_outputs.jsonl../train_sft_distillation.py (SFT), ../../train_grpo_gsm8k.py (RL)../merge_lora_model.py../README.mdgsm8k_test_outputs.jsonl, containing detailed reasoning processes for each sampleIf you use this model, please cite the related training methods and datasets: