File size: 2,527 Bytes
535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 535164b 3726842 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | ---
library_name: transformers
tags:
- small-lm
- code
- reasoning
- slm
license: apache-2.0
datasets:
- theblackcat102/evol-codealpaca-v1
base_model:
- Qwen/Qwen3-0.6B
---
# Qwen3-0.6B-Code
This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) train split.
The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/.
Single A100 was used for fine-tuning and evaluation.
The following versions were used for train/eval:
- python >= 3.10
- torch : 2.9.0+cu128
- lm_eval : 0.4.9.1
- vllm : 0.11.1
- transformers : 4.57.6
- datasets : 3.2.0
- numpy : 2.2.6
## Training
The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:
```bash
python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name theblackcat102/evol-codealpaca-v1 --learning_rate 2e-5 \
--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy no \
--completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
```
This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.
The dataset was preprocessed to the conversational format:
```python
# trl/scripts/sft.py
dataset = load_dataset(...)
def preprocess_function(example):
return {
"prompt": [{"role": "user", "content": example["instruction"]}],
"completion": [
{"role": "assistant", "content": example['output']}
],
}
dataset = dataset.map(preprocess_function)
```
## Evaluation
Evaluation was done with lm_eval on the humaneval (instruct) benchmark:
```bash
python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
--tasks humaneval_instruct --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
```
### Results
| Model | humaneval_instruct |
|-----------------------|--------------------|
| Qwen3-0.6B | 38.4 |
| Qwen3-0.6B-Code | 46.3 |
## License
Please refer to the license of the original model [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) and dataset [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1). |