File size: 2,527 Bytes
535164b
 
3726842
 
 
 
 
 
 
 
 
 
535164b
 
3726842
535164b
3726842
 
 
535164b
3726842
535164b
3726842
 
 
 
 
 
 
535164b
3726842
535164b
3726842
535164b
3726842
 
 
 
 
535164b
3726842
535164b
3726842
535164b
3726842
 
535164b
3726842
535164b
3726842
 
 
 
 
 
 
535164b
3726842
 
535164b
 
 
3726842
535164b
3726842
 
 
 
535164b
 
 
3726842
 
 
 
535164b
3726842
535164b
3726842
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
library_name: transformers
tags:
- small-lm
- code
- reasoning
- slm
license: apache-2.0
datasets:
- theblackcat102/evol-codealpaca-v1
base_model:
- Qwen/Qwen3-0.6B
---

# Qwen3-0.6B-Code

This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) train split. 
The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/. 
Single A100 was used for fine-tuning and evaluation.

The following versions were used for train/eval:

- python >= 3.10
- torch               : 2.9.0+cu128
- lm_eval             : 0.4.9.1
- vllm                : 0.11.1
- transformers        : 4.57.6
- datasets            : 3.2.0
- numpy               : 2.2.6

## Training

The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:

```bash
python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name theblackcat102/evol-codealpaca-v1 --learning_rate 2e-5 \
--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy no \
--completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
```

This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.

The dataset was preprocessed to the conversational format:

```python
# trl/scripts/sft.py

dataset = load_dataset(...)

def preprocess_function(example):
  return {
  "prompt": [{"role": "user", "content": example["instruction"]}],
  "completion": [
      {"role": "assistant", "content": example['output']}
  ],
  }

dataset = dataset.map(preprocess_function)
```

## Evaluation

Evaluation was done with lm_eval on the humaneval (instruct) benchmark:

```bash
python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
 --tasks humaneval_instruct --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
```

### Results

| Model                 | humaneval_instruct |
|-----------------------|--------------------|
| Qwen3-0.6B            | 38.4               |
| Qwen3-0.6B-Code       | 46.3               |

## License

Please refer to the license of the original model [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) and dataset [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1).