---
title: Codeforces CoTs SFT Experiment (7B)
date: 2025-12-19
status: complete
---

## Objective
Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs.

## Dataset
- name: open-r1/codeforces-cots
- license: cc-by-4.0
- description: ~10k Codeforces problems with up to five reasoning traces (per dataset card)
- fields used: prompt, generation (mapped to prompt/completion)
- preprocessing:
  - strip <think>...</think>
  - extract last fenced code block when present
  - clean prompt to remove step-by-step phrasing (optional)
  - train on completion only via prompt/completion format

## Model
- base: Qwen/Qwen2.5-Coder-7B-Instruct
- method: QLoRA (4-bit nf4)
- max_length: 1024 (configurable)
- adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj

## Training Config (initial)
- epochs: 1
- per_device_train_batch_size: 1
- gradient_accumulation_steps: 8
- learning_rate: 1e-4
- eval_fraction: 0.02
- bf16: true (A30)
- gradient_checkpointing: true
- trackio: enabled (set USE_TRACKIO=1)
  - project: codeforces-sft
  - run_name: codeforces-sft-7b
  - space_id: minksypoooo/trackio

## Runs
| run_id | date | model | epochs | max_length | lr | samples | notes | eval_pass_at_1 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| baseline | 2025-12-20 | Qwen/Qwen2.5-Coder-7B-Instruct | 0 | n/a | n/a | all | base model | 0.7500 |
| run-001 | 2025-12-19 | Qwen/Qwen2.5-Coder-7B-Instruct | 1 | 1024 | 1e-4 | all | initial setup | 0.8232 |

## Evaluation (openai_humaneval)
Result:
- baseline (Qwen/Qwen2.5-Coder-7B-Instruct):
  - pass@1 (accuracy): 0.7500
  - stderr: 0.0339
  - log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval
- pass@1 (accuracy): 0.8232
- stderr: 0.0299
- log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval

Command (from repo root):

```bash
python3 experiments/codeforces_sft/eval_humaneval.py \
  --model-path codeforces-sft-7b/merged \
  --log-dir experiments/codeforces_sft/humaneval_logs \
  --torch-dtype float16 \
  --max-connections 1 \
  --temperature 0.001 \
  --sandbox local
```

Auto-run helper:
- script: `experiments/codeforces_sft/run_humaneval_when_ready.sh`
- logs: `experiments/codeforces_sft/humaneval.log`

## Notes
- Training script: experiments/codeforces_sft/train_sft.py
- For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1
- If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0
- HumanEval verification runs with local sandbox due to Docker socket permissions on this host