bknyaz
/

Qwen3-0.6B-Math

Text Generation

text-generation-inference

Model card Files Files and versions

bknyaz commited on Jan 30

Commit

80e33c1

·

verified ·

1 Parent(s): d5d8db7

Update README.md

Files changed (1) hide show

README.md +14 -2

README.md CHANGED Viewed

@@ -14,19 +14,31 @@ base_model:
 # Qwen3-0.6B-Math
-This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the gsm8k train split.
 Single A100 was used for fine-tuning and evaluation.
 ## Training
 The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:
 ```bash
 python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name openai/gsm8k --dataset_config main --learning_rate 2e-5 \
---num_train_epochs 1 --per_device_train_batch_size 2 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy steps \
 --eval_steps 100 --completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
 ```
 The dataset was preprocessed to the conversational format:
 ```python

 # Qwen3-0.6B-Math
+This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the gsm8k train split. It used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/.
 Single A100 was used for fine-tuning and evaluation.
+The following versions were used for train/eval:
+- python >= 3.10
+- torch               : 2.9.0+cu128
+- lm_eval             : 0.4.9.1
+- vllm                : 0.11.1
+- transformers        : 4.57.6
+- datasets            : 3.2.0
+- numpy               : 2.2.6
 ## Training
 The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:
 ```bash
 python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name openai/gsm8k --dataset_config main --learning_rate 2e-5 \
+--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy steps \
 --eval_steps 100 --completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
 ```
+This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.
 The dataset was preprocessed to the conversational format:
 ```python