Objective
Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs.
Dataset
- name: open-r1/codeforces-cots
- license: cc-by-4.0
- description: ~10k Codeforces problems with up to five reasoning traces (per dataset card)
- fields used: prompt, generation (mapped to prompt/completion)
- preprocessing:
- strip ...
- extract last fenced code block when present
- clean prompt to remove step-by-step phrasing (optional)
- train on completion only via prompt/completion format
Model
- base: Qwen/Qwen2.5-Coder-7B-Instruct
- method: QLoRA (4-bit nf4)
- max_length: 1024 (configurable)
- adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj
Training Config (initial)
- epochs: 1
- per_device_train_batch_size: 1
- gradient_accumulation_steps: 8
- learning_rate: 1e-4
- eval_fraction: 0.02
- bf16: true (A30)
- gradient_checkpointing: true
- trackio: enabled (set USE_TRACKIO=1)
- project: codeforces-sft
- run_name: codeforces-sft-7b
- space_id: minksypoooo/trackio
Runs
| run_id | date | model | epochs | max_length | lr | samples | notes | eval_pass_at_1 |
|---|---|---|---|---|---|---|---|---|
| baseline | 2025-12-20 | Qwen/Qwen2.5-Coder-7B-Instruct | 0 | n/a | n/a | all | base model | 0.7500 |
| run-001 | 2025-12-19 | Qwen/Qwen2.5-Coder-7B-Instruct | 1 | 1024 | 1e-4 | all | initial setup | 0.8232 |
Evaluation (openai_humaneval)
Result:
- baseline (Qwen/Qwen2.5-Coder-7B-Instruct):
- pass@1 (accuracy): 0.7500
- stderr: 0.0339
- log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval
- pass@1 (accuracy): 0.8232
- stderr: 0.0299
- log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval
Command (from repo root):
python3 experiments/codeforces_sft/eval_humaneval.py \
--model-path codeforces-sft-7b/merged \
--log-dir experiments/codeforces_sft/humaneval_logs \
--torch-dtype float16 \
--max-connections 1 \
--temperature 0.001 \
--sandbox local
Auto-run helper:
- script:
experiments/codeforces_sft/run_humaneval_when_ready.sh - logs:
experiments/codeforces_sft/humaneval.log
Notes
- Training script: experiments/codeforces_sft/train_sft.py
- For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1
- If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0
- HumanEval verification runs with local sandbox due to Docker socket permissions on this host
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support