--- title: Codeforces CoTs SFT Experiment (7B) date: 2025-12-19 status: complete --- ## Objective Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs. ## Dataset - name: open-r1/codeforces-cots - license: cc-by-4.0 - description: ~10k Codeforces problems with up to five reasoning traces (per dataset card) - fields used: prompt, generation (mapped to prompt/completion) - preprocessing: - strip ... - extract last fenced code block when present - clean prompt to remove step-by-step phrasing (optional) - train on completion only via prompt/completion format ## Model - base: Qwen/Qwen2.5-Coder-7B-Instruct - method: QLoRA (4-bit nf4) - max_length: 1024 (configurable) - adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj ## Training Config (initial) - epochs: 1 - per_device_train_batch_size: 1 - gradient_accumulation_steps: 8 - learning_rate: 1e-4 - eval_fraction: 0.02 - bf16: true (A30) - gradient_checkpointing: true - trackio: enabled (set USE_TRACKIO=1) - project: codeforces-sft - run_name: codeforces-sft-7b - space_id: minksypoooo/trackio ## Runs | run_id | date | model | epochs | max_length | lr | samples | notes | eval_pass_at_1 | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | baseline | 2025-12-20 | Qwen/Qwen2.5-Coder-7B-Instruct | 0 | n/a | n/a | all | base model | 0.7500 | | run-001 | 2025-12-19 | Qwen/Qwen2.5-Coder-7B-Instruct | 1 | 1024 | 1e-4 | all | initial setup | 0.8232 | ## Evaluation (openai_humaneval) Result: - baseline (Qwen/Qwen2.5-Coder-7B-Instruct): - pass@1 (accuracy): 0.7500 - stderr: 0.0339 - log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval - pass@1 (accuracy): 0.8232 - stderr: 0.0299 - log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval Command (from repo root): ```bash python3 experiments/codeforces_sft/eval_humaneval.py \ --model-path codeforces-sft-7b/merged \ --log-dir experiments/codeforces_sft/humaneval_logs \ --torch-dtype float16 \ --max-connections 1 \ --temperature 0.001 \ --sandbox local ``` Auto-run helper: - script: `experiments/codeforces_sft/run_humaneval_when_ready.sh` - logs: `experiments/codeforces_sft/humaneval.log` ## Notes - Training script: experiments/codeforces_sft/train_sft.py - For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1 - If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0 - HumanEval verification runs with local sandbox due to Docker socket permissions on this host