minksypoooo
/

codeforces-sft-7b

Model card Files Files and versions

codeforces-sft-7b / README.md

minksypoooo's picture

Add merged SFT model and report

508f33d verified 4 months ago

|

history blame contribute delete

2.65 kB

	---
	title: Codeforces CoTs SFT Experiment (7B)
	date: 2025-12-19
	status: complete
	---

	## Objective
	Improve code-solving performance on openai_humaneval with supervised fine-tuning on Codeforces CoTs.

	## Dataset
	- name: open-r1/codeforces-cots
	- license: cc-by-4.0
	- description: ~10k Codeforces problems with up to five reasoning traces (per dataset card)
	- fields used: prompt, generation (mapped to prompt/completion)
	- preprocessing:
	- strip <think>...</think>
	- extract last fenced code block when present
	- clean prompt to remove step-by-step phrasing (optional)
	- train on completion only via prompt/completion format

	## Model
	- base: Qwen/Qwen2.5-Coder-7B-Instruct
	- method: QLoRA (4-bit nf4)
	- max_length: 1024 (configurable)
	- adapters: LoRA r=16, alpha=32, dropout=0.05, target_modules q_proj/v_proj

	## Training Config (initial)
	- epochs: 1
	- per_device_train_batch_size: 1
	- gradient_accumulation_steps: 8
	- learning_rate: 1e-4
	- eval_fraction: 0.02
	- bf16: true (A30)
	- gradient_checkpointing: true
	- trackio: enabled (set USE_TRACKIO=1)
	- project: codeforces-sft
	- run_name: codeforces-sft-7b
	- space_id: minksypoooo/trackio

	## Runs
	\| run_id \| date \| model \| epochs \| max_length \| lr \| samples \| notes \| eval_pass_at_1 \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| baseline \| 2025-12-20 \| Qwen/Qwen2.5-Coder-7B-Instruct \| 0 \| n/a \| n/a \| all \| base model \| 0.7500 \|
	\| run-001 \| 2025-12-19 \| Qwen/Qwen2.5-Coder-7B-Instruct \| 1 \| 1024 \| 1e-4 \| all \| initial setup \| 0.8232 \|

	## Evaluation (openai_humaneval)
	Result:
	- baseline (Qwen/Qwen2.5-Coder-7B-Instruct):
	- pass@1 (accuracy): 0.7500
	- stderr: 0.0339
	- log: experiments/codeforces_sft/humaneval_logs_base/2025-12-20T05-49-34+00-00_humaneval_BuGBwDQwccCM8aBcEzB632.eval
	- pass@1 (accuracy): 0.8232
	- stderr: 0.0299
	- log: experiments/codeforces_sft/humaneval_logs/2025-12-19T18-45-50+00-00_humaneval_7d4Y6JkgfyVDJFuAmWxaLH.eval

	Command (from repo root):

	```bash
	python3 experiments/codeforces_sft/eval_humaneval.py \
	--model-path codeforces-sft-7b/merged \
	--log-dir experiments/codeforces_sft/humaneval_logs \
	--torch-dtype float16 \
	--max-connections 1 \
	--temperature 0.001 \
	--sandbox local
	```

	Auto-run helper:
	- script: `experiments/codeforces_sft/run_humaneval_when_ready.sh`
	- logs: `experiments/codeforces_sft/humaneval.log`

	## Notes
	- Training script: experiments/codeforces_sft/train_sft.py
	- For code-only outputs, keep STRIP_THINK=1 and EXTRACT_CODE=1
	- If you want to keep reasoning, set STRIP_THINK=0 and EXTRACT_CODE=0
	- HumanEval verification runs with local sandbox due to Docker socket permissions on this host