File size: 2,650 Bytes
508f33d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | ---
title: Codeforces CoTs SFT Experiment (7B)
date: 2025-12-19
status: complete
---
## Objective
Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs.
## Dataset
- name: open-r1/codeforces-cots
- license: cc-by-4.0
- description: ~10k Codeforces problems with up to five reasoning traces (per dataset card)
- fields used: prompt, generation (mapped to prompt/completion)
- preprocessing:
- strip <think>...</think>
- extract last fenced code block when present
- clean prompt to remove step-by-step phrasing (optional)
- train on completion only via prompt/completion format
## Model
- base: Qwen/Qwen2.5-Coder-7B-Instruct
- method: QLoRA (4-bit nf4)
- max_length: 1024 (configurable)
- adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj
## Training Config (initial)
- epochs: 1
- per_device_train_batch_size: 1
- gradient_accumulation_steps: 8
- learning_rate: 1e-4
- eval_fraction: 0.02
- bf16: true (A30)
- gradient_checkpointing: true
- trackio: enabled (set USE_TRACKIO=1)
- project: codeforces-sft
- run_name: codeforces-sft-7b
- space_id: minksypoooo/trackio
## Runs
| run_id | date | model | epochs | max_length | lr | samples | notes | eval_pass_at_1 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| baseline | 2025-12-20 | Qwen/Qwen2.5-Coder-7B-Instruct | 0 | n/a | n/a | all | base model | 0.7500 |
| run-001 | 2025-12-19 | Qwen/Qwen2.5-Coder-7B-Instruct | 1 | 1024 | 1e-4 | all | initial setup | 0.8232 |
## Evaluation (openai_humaneval)
Result:
- baseline (Qwen/Qwen2.5-Coder-7B-Instruct):
- pass@1 (accuracy): 0.7500
- stderr: 0.0339
- log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval
- pass@1 (accuracy): 0.8232
- stderr: 0.0299
- log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval
Command (from repo root):
```bash
python3 experiments/codeforces_sft/eval_humaneval.py \
--model-path codeforces-sft-7b/merged \
--log-dir experiments/codeforces_sft/humaneval_logs \
--torch-dtype float16 \
--max-connections 1 \
--temperature 0.001 \
--sandbox local
```
Auto-run helper:
- script: `experiments/codeforces_sft/run_humaneval_when_ready.sh`
- logs: `experiments/codeforces_sft/humaneval.log`
## Notes
- Training script: experiments/codeforces_sft/train_sft.py
- For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1
- If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0
- HumanEval verification runs with local sandbox due to Docker socket permissions on this host
|