File size: 2,376 Bytes
eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a eff9de3 6c0e18a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | ---
library_name: transformers
tags:
- small-lm
- math
- reasoning
- slm
license: apache-2.0
datasets:
- openai/gsm8k
base_model:
- Qwen/Qwen3-1.7B
---
# Qwen3-1.7B-Math
This model is obtained by fine-tuning Qwen/Qwen3-1.7B on the [gsm8k](https://huggingface.co/datasets/openai/gsm8k) train split.
The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/.
Single A100 was used for fine-tuning and evaluation.
The following versions were used for train/eval:
- python >= 3.10
- torch : 2.9.0+cu128
- lm_eval : 0.4.9.1
- vllm : 0.11.1
- transformers : 4.57.6
- datasets : 3.2.0
- numpy : 2.2.6
## Training
The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:
```bash
python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-1.7B --dataset_name openai/gsm8k --dataset_config main --learning_rate 2e-5 \
--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy steps \
--eval_steps 100 --completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
```
This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.
The dataset was preprocessed to the conversational format:
```python
# trl/scripts/sft.py
dataset = load_dataset(...)
def preprocess_function(example):
return {
"prompt": [{"role": "user", "content": example["question"]}],
"completion": [
{"role": "assistant", "content": example['answer']}
],
}
dataset = dataset.map(preprocess_function)
```
## Evaluation
Evaluation was done with lm_eval on the test split of gsm8k:
```bash
python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
--tasks gsm8k --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
```
### Results
| Model | gsm8k|
|-----------------------|------|
| Qwen3-1.7B | 20.6 |
| Qwen3-1.7B-Math | 62.1 |
## License
Please refer to the license of the original model [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) and dataset [gsm8k](https://huggingface.co/datasets/openai/gsm8k). |