Chia-Mu-Lab
/

ot-q3_32b-clean

Text Generation

reasoning-trace-extraction

Model card Files Files and versions

ot-q3_32b-clean / README.md

0x-YuAN's picture

refresh README.md

5b4684d verified 12 days ago

|

history blame contribute delete

3.11 kB

	---
	license: apache-2.0
	library_name: transformers
	base_model: Qwen/Qwen2.5-7B-Instruct
	pipeline_tag: text-generation
	tags:
	- distillation
	- reasoning-trace-extraction
	- openthoughts
	- qwen3
	- victim-model
	datasets:
	- Chia-Mu-Lab/openthoughts_distill_victim_data_10k_clean_swag_qwen3_32b
	---

	# ot-q3_32b-clean

	Qwen2.5-7B-Instruct student model fine-tuned by full-parameter SFT (s1 recipe)
	on Qwen3-32B (OpenThoughts SWAG, V3-attack cleaned) reasoning traces.

	This repo is part of a 4-victim study comparing student distillation outcomes
	when the teacher's reasoning traces are extracted via the V3 attack (`-orig`)
	vs. when the V3 attack wrapper is stripped before training (`-clean`).

	## How to load a specific epoch

	Each `epoch_N/` subfolder is a self-contained, loadable HF checkpoint.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	REPO = "Chia-Mu-Lab/ot-q3_32b-clean"
	model = AutoModelForCausalLM.from_pretrained(REPO, subfolder="epoch_5", torch_dtype="bfloat16")
	tok = AutoTokenizer.from_pretrained(REPO, subfolder="epoch_5")
	```

	## Per-epoch evaluation

	All numbers are accuracies in percent on the canonical eval suite
	(GSM8K-MATH500, AIME24, AIME25, JEEBench Math subset strict/partial, LiveCodeBench
	v5 pass@1). The `base` row is the Qwen2.5-7B-Instruct starting point, evaluated
	identically. Bold values across this row indicate per-victim peaks.

	\| Epoch \| Ckpt \| MATH500 \| AIME24 \| AIME25 \| JEE Math (s/p) \| LCB pass@1 \|
	\|---:\|:---\|---:\|---:\|---:\|---:\|---:\|
	\| 0 \| base (Qwen2.5-7B-Instruct) \| 71.0 \| 8.9 \| 2.2 \| 32.2 / 35.9 \| 15.8 \|
	\| 1 \| step-00619 \| 54.2 \| 2.2 \| 8.9 \| 23.2 / 27.2 \| 18.3 \|
	\| 2 \| step-01239 \| 59.8 \| 8.9 \| 6.7 \| 25.2 / 29.9 \| 17.9 \|
	\| 3 \| step-01858 \| 69.5 \| 12.2 \| 15.6 \| 35.1 / 38.9 \| 16.1 \|
	\| 4 \| step-02478 \| 72.8 \| 14.4 \| 17.8 \| 36.4 / 41.1 \| 15.8 \|
	\| 5 \| step-03095 \| 74.9 \| 12.2 \| 15.6 \| 39.3 / 43.6 \| 14.7 \|

	## Training recipe

	* Base model: Qwen/Qwen2.5-7B-Instruct
	* Teacher traces: `Chia-Mu-Lab/openthoughts_distill_victim_data_10k_clean_swag_qwen3_32b`
	* Recipe: s1 paper exact full fine-tune (FSDP full-shard, no LoRA)
	* Block size: 32768 tokens · effective batch 16 (mb=1, ga=4, 4×H200)
	* Optimizer: AdamW, lr=1e-5 cosine, warmup_ratio=0.05, bf16
	* Epochs: 5, `save_strategy=epoch`

	## Files

	```
	ot-q3_32b-clean/
	README.md
	metrics.csv ← machine-readable per-epoch metric table
	epoch_1/ ← full HF checkpoint dir (config.json, model-*.safetensors,
	epoch_2/ tokenizer*, etc.)
	epoch_3/
	epoch_4/
	epoch_5/
	```

	## Caveats / known issues

	* All epochs here are from the canonical s1-distill 3-exp sweep (2026-05-20), evaluated with the unified math500+AIME+JEE+LCB scorer pipeline.
	* JEE Math here refers to the subject="math" subset (≈236 of 515 questions)
	scored per the official `dair-iitd` `compute_metrics.py`. The strict number
	is the headline accuracy; partial gives MCQ(multiple) partial credit.
	* These models are research artifacts for the steel-reasoning-trace project
	(reasoning-trace extraction attack study); do not use for production.