File size: 2,376 Bytes

eff9de3
 
6c0e18a
 
 
 
 
 
 
 
 
 
eff9de3
 
6c0e18a
eff9de3
6c0e18a
 
 
eff9de3
6c0e18a
eff9de3
6c0e18a
 
 
 
 
 
 
eff9de3
6c0e18a
eff9de3
6c0e18a
eff9de3
6c0e18a
 
 
 
 
eff9de3
6c0e18a
eff9de3
6c0e18a
eff9de3
6c0e18a
 
eff9de3
6c0e18a
eff9de3
6c0e18a
 
 
 
 
 
 
eff9de3
6c0e18a
 
eff9de3
 
 
6c0e18a
eff9de3
6c0e18a
 
 
 
eff9de3
 
 
6c0e18a
 
 
 
eff9de3
6c0e18a
eff9de3
6c0e18a

---
library_name: transformers
tags:
- small-lm
- math
- reasoning
- slm
license: apache-2.0
datasets:
- openai/gsm8k
base_model:
- Qwen/Qwen3-1.7B
---

# Qwen3-1.7B-Math

This model is obtained by fine-tuning Qwen/Qwen3-1.7B on the [gsm8k](https://huggingface.co/datasets/openai/gsm8k) train split. 
The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/. 
Single A100 was used for fine-tuning and evaluation.

The following versions were used for train/eval:

- python >= 3.10
- torch               : 2.9.0+cu128
- lm_eval             : 0.4.9.1
- vllm                : 0.11.1
- transformers        : 4.57.6
- datasets            : 3.2.0
- numpy               : 2.2.6

## Training

The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:

```bash
python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-1.7B --dataset_name openai/gsm8k --dataset_config main --learning_rate 2e-5 \
--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy steps \
--eval_steps 100 --completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
```

This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.

The dataset was preprocessed to the conversational format:

```python
# trl/scripts/sft.py

dataset = load_dataset(...)

def preprocess_function(example):
  return {
  "prompt": [{"role": "user", "content": example["question"]}],
  "completion": [
      {"role": "assistant", "content": example['answer']}
  ],
  }

dataset = dataset.map(preprocess_function)
```

## Evaluation

Evaluation was done with lm_eval on the test split of gsm8k:

```bash
python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
 --tasks gsm8k --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
```

### Results

| Model                 | gsm8k|
|-----------------------|------|
| Qwen3-1.7B            | 20.6 |
| Qwen3-1.7B-Math       | 62.1 |

## License

Please refer to the license of the original model [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) and dataset [gsm8k](https://huggingface.co/datasets/openai/gsm8k).