Objective

Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs.

Dataset

  • name: open-r1/codeforces-cots
  • license: cc-by-4.0
  • description: ~10k Codeforces problems with up to five reasoning traces (per dataset card)
  • fields used: prompt, generation (mapped to prompt/completion)
  • preprocessing:
    • strip ...
    • extract last fenced code block when present
    • clean prompt to remove step-by-step phrasing (optional)
    • train on completion only via prompt/completion format

Model

  • base: Qwen/Qwen2.5-Coder-7B-Instruct
  • method: QLoRA (4-bit nf4)
  • max_length: 1024 (configurable)
  • adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj

Training Config (initial)

  • epochs: 1
  • per_device_train_batch_size: 1
  • gradient_accumulation_steps: 8
  • learning_rate: 1e-4
  • eval_fraction: 0.02
  • bf16: true (A30)
  • gradient_checkpointing: true
  • trackio: enabled (set USE_TRACKIO=1)
    • project: codeforces-sft
    • run_name: codeforces-sft-7b
    • space_id: minksypoooo/trackio

Runs

run_id date model epochs max_length lr samples notes eval_pass_at_1
baseline 2025-12-20 Qwen/Qwen2.5-Coder-7B-Instruct 0 n/a n/a all base model 0.7500
run-001 2025-12-19 Qwen/Qwen2.5-Coder-7B-Instruct 1 1024 1e-4 all initial setup 0.8232

Evaluation (openai_humaneval)

Result:

  • baseline (Qwen/Qwen2.5-Coder-7B-Instruct):
    • pass@1 (accuracy): 0.7500
    • stderr: 0.0339
    • log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval
  • pass@1 (accuracy): 0.8232
  • stderr: 0.0299
  • log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval

Command (from repo root):

python3 experiments/codeforces_sft/eval_humaneval.py \
  --model-path codeforces-sft-7b/merged \
  --log-dir experiments/codeforces_sft/humaneval_logs \
  --torch-dtype float16 \
  --max-connections 1 \
  --temperature 0.001 \
  --sandbox local

Auto-run helper:

  • script: experiments/codeforces_sft/run_humaneval_when_ready.sh
  • logs: experiments/codeforces_sft/humaneval.log

Notes

  • Training script: experiments/codeforces_sft/train_sft.py
  • For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1
  • If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0
  • HumanEval verification runs with local sandbox due to Docker socket permissions on this host
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support