| --- |
| title: Codeforces CoTs SFT Experiment (7B) |
| date: 2025-12-19 |
| status: complete |
| --- |
| |
| ## Objective |
| Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs. |
| |
| ## Dataset |
| - name: open-r1/codeforces-cots |
| - license: cc-by-4.0 |
| - description: ~10k Codeforces problems with up to five reasoning traces (per dataset card) |
| - fields used: prompt, generation (mapped to prompt/completion) |
| - preprocessing: |
| - strip <think>...</think> |
| - extract last fenced code block when present |
| - clean prompt to remove step-by-step phrasing (optional) |
| - train on completion only via prompt/completion format |
| |
| ## Model |
| - base: Qwen/Qwen2.5-Coder-7B-Instruct |
| - method: QLoRA (4-bit nf4) |
| - max_length: 1024 (configurable) |
| - adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj |
| |
| ## Training Config (initial) |
| - epochs: 1 |
| - per_device_train_batch_size: 1 |
| - gradient_accumulation_steps: 8 |
| - learning_rate: 1e-4 |
| - eval_fraction: 0.02 |
| - bf16: true (A30) |
| - gradient_checkpointing: true |
| - trackio: enabled (set USE_TRACKIO=1) |
| - project: codeforces-sft |
| - run_name: codeforces-sft-7b |
| - space_id: minksypoooo/trackio |
| |
| ## Runs |
| | run_id | date | model | epochs | max_length | lr | samples | notes | eval_pass_at_1 | |
| | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
| | baseline | 2025-12-20 | Qwen/Qwen2.5-Coder-7B-Instruct | 0 | n/a | n/a | all | base model | 0.7500 | |
| | run-001 | 2025-12-19 | Qwen/Qwen2.5-Coder-7B-Instruct | 1 | 1024 | 1e-4 | all | initial setup | 0.8232 | |
|
|
| ## Evaluation (openai_humaneval) |
| Result: |
| - baseline (Qwen/Qwen2.5-Coder-7B-Instruct): |
| - pass@1 (accuracy): 0.7500 |
| - stderr: 0.0339 |
| - log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval |
| - pass@1 (accuracy): 0.8232 |
| - stderr: 0.0299 |
| - log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval |
|
|
| Command (from repo root): |
|
|
| ```bash |
| python3 experiments/codeforces_sft/eval_humaneval.py \ |
| --model-path codeforces-sft-7b/merged \ |
| --log-dir experiments/codeforces_sft/humaneval_logs \ |
| --torch-dtype float16 \ |
| --max-connections 1 \ |
| --temperature 0.001 \ |
| --sandbox local |
| ``` |
|
|
| Auto-run helper: |
| - script: `experiments/codeforces_sft/run_humaneval_when_ready.sh` |
| - logs: `experiments/codeforces_sft/humaneval.log` |
|
|
| ## Notes |
| - Training script: experiments/codeforces_sft/train_sft.py |
| - For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1 |
| - If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0 |
| - HumanEval verification runs with local sandbox due to Docker socket permissions on this host |
|
|