Objective

Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs.

Dataset

name: open-r1/codeforces-cots
license: cc-by-4.0
description: ~10k Codeforces problems with up to five reasoning traces (per dataset card)
fields used: prompt, generation (mapped to prompt/completion)
preprocessing:
- strip ...
- extract last fenced code block when present
- clean prompt to remove step-by-step phrasing (optional)
- train on completion only via prompt/completion format

Model

base: Qwen/Qwen2.5-Coder-7B-Instruct
method: QLoRA (4-bit nf4)
max_length: 1024 (configurable)
adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj

Training Config (initial)

epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1e-4
eval_fraction: 0.02
bf16: true (A30)
gradient_checkpointing: true
trackio: enabled (set USE_TRACKIO=1)
- project: codeforces-sft
- run_name: codeforces-sft-7b
- space_id: minksypoooo/trackio

Runs

run_id	date	model	epochs	max_length	lr	samples	notes	eval_pass_at_1
baseline	2025-12-20	Qwen/Qwen2.5-Coder-7B-Instruct	0	n/a	n/a	all	base model	0.7500
run-001	2025-12-19	Qwen/Qwen2.5-Coder-7B-Instruct	1	1024	1e-4	all	initial setup	0.8232

Evaluation (openai_humaneval)

Result:

baseline (Qwen/Qwen2.5-Coder-7B-Instruct):
- pass@1 (accuracy): 0.7500
- stderr: 0.0339
- log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval
pass@1 (accuracy): 0.8232
stderr: 0.0299
log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval

Command (from repo root):

python3 experiments/codeforces_sft/eval_humaneval.py \
  --model-path codeforces-sft-7b/merged \
  --log-dir experiments/codeforces_sft/humaneval_logs \
  --torch-dtype float16 \
  --max-connections 1 \
  --temperature 0.001 \
  --sandbox local

Auto-run helper:

script: experiments/codeforces_sft/run_humaneval_when_ready.sh
logs: experiments/codeforces_sft/humaneval.log

Notes

Training script: experiments/codeforces_sft/train_sft.py
For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1
If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0
HumanEval verification runs with local sandbox due to Docker socket permissions on this host

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support