File size: 2,527 Bytes
e08edc1
 
4f93e18
 
 
 
 
 
 
 
 
 
e08edc1
 
4f93e18
e08edc1
4f93e18
 
 
e08edc1
4f93e18
e08edc1
4f93e18
 
 
 
 
 
 
e08edc1
4f93e18
e08edc1
4f93e18
e08edc1
4f93e18
 
 
 
 
e08edc1
4f93e18
e08edc1
4f93e18
e08edc1
4f93e18
 
e08edc1
4f93e18
e08edc1
4f93e18
 
 
 
 
 
 
e08edc1
4f93e18
 
e08edc1
 
 
4f93e18
e08edc1
4f93e18
 
 
 
e08edc1
 
 
4f93e18
 
 
 
e08edc1
4f93e18
e08edc1
4f93e18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
library_name: transformers
tags:
- small-lm
- code
- reasoning
- slm
license: apache-2.0
datasets:
- theblackcat102/evol-codealpaca-v1
base_model:
- Qwen/Qwen3-1.7B
---

# Qwen3-1.7B-Code

This model is obtained by fine-tuning Qwen/Qwen3-1.7B on the [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) train split. 
The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/. 
Single A100 was used for fine-tuning and evaluation.

The following versions were used for train/eval:

- python >= 3.10
- torch               : 2.9.0+cu128
- lm_eval             : 0.4.9.1
- vllm                : 0.11.1
- transformers        : 4.57.6
- datasets            : 3.2.0
- numpy               : 2.2.6

## Training

The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:

```bash
python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-1.7B --dataset_name theblackcat102/evol-codealpaca-v1 --learning_rate 2e-5 \
--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy no \
--completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
```

This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.

The dataset was preprocessed to the conversational format:

```python
# trl/scripts/sft.py

dataset = load_dataset(...)

def preprocess_function(example):
  return {
  "prompt": [{"role": "user", "content": example["instruction"]}],
  "completion": [
      {"role": "assistant", "content": example['output']}
  ],
  }

dataset = dataset.map(preprocess_function)
```

## Evaluation

Evaluation was done with lm_eval on the humaneval (instruct) benchmark:

```bash
python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
 --tasks humaneval_instruct --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
```

### Results

| Model                 | humaneval_instruct |
|-----------------------|--------------------|
| Qwen3-1.7B            | 67.1               |
| Qwen3-1.7B-Code       | 69.5               |

## License

Please refer to the license of the original model [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) and dataset [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1).