qwen25-7b-ot-q3_14b-clean-code

Distilled checkpoints from full-parameter SFT of Qwen/Qwen2.5-7B-Instruct on Chia-Mu-Lab/ot-q3_14b-clean-code, a Qwen3-14B-teacher dump of OpenThoughts-114k code-prompt reasoning traces extracted via a V3-style prompt-injection attack. 6 epoch ckpts, 4×B200, eff_batch 16, lr 1e-5 cosine warmup 0.05.

Variant: clean-code — the V3 attack bash-fence (```bash\n$ cat reasoning_trace.txt) was stripped at curation time, and only rows with structural=True were kept (10000/10000 rows after filtering).

Training recipe

field value
Student Qwen/Qwen2.5-7B-Instruct
Teacher Qwen3-14B (via OpenThoughts code-prompt attack)
Dataset Chia-Mu-Lab/ot-q3_14b-clean-code (10000 usable rows after filter)
Hardware 4×B200 (Modal)
Epochs 6 (one ckpt per epoch)
Block size 32768
Micro / Grad-accum / Effective batch 1 / 4 / 16
Learning rate 1e-5 (cosine, warmup 0.05)
Optimizer AdamW (β=0.9/0.95, wd=1e-4)
Sharding FSDP (full_shard auto_wrap, Qwen2DecoderLayer, FULL_STATE_DICT)
Attention flash_attention_2
Precision bf16

Evaluation

Evaluated on AIME24+AIME25 (n=3, T=0.5), MATH-500 (n=3, T=0.5), JEEbench subject=='math' subset (n=6, T=0.5), and LiveCodeBench-v5 release window 2024-08-01→2025-02-01 (n=3, T=0.5). All numbers are % accuracy; (±N.N) is the delta vs base Qwen/Qwen2.5-7B-Instruct evaluated under the same protocol.

ckpt epoch AIME24 AIME25 MATH500 JEE-math LCB-v5
base 8.89 2.22 70.93 32.49 15.77
step-00625 ep1 5.56 (-3.3) 6.67 (+4.4) 59.47 (-11.5) 18.22 (-14.3) 10.75 (-5.0)
step-01250 ep2 8.89 (+0.0) 11.11 (+8.9) 66.20 (-4.7) 26.91 (-5.6) 10.75 (-5.0)
step-01875 ep3 12.22 (+3.3) 20.00 (+17.8) 71.13 (+0.2) 32.34 (-0.1) 10.04 (-5.7)
step-02500 ep4 14.44 (+5.6) 13.33 (+11.1) 74.87 (+3.9) 33.69 (+1.2) 12.19 (-3.6)
step-03125 ep5 12.22 (+3.3) 15.56 (+13.3) 74.73 (+3.8) 35.45 (+3.0) 12.19 (-3.6)
step-03750 ep6 13.33 (+4.4) 15.56 (+13.3) 73.67 (+2.7) 32.70 (+0.2) 11.83 (-3.9)

Checkpoints layout

Each epoch ckpt lives in its own subdirectory inside this repo. To load a specific epoch with 🤗 Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code"
sub  = "checkpoint-2500"  # one of: checkpoint-625, checkpoint-1250, checkpoint-1875, checkpoint-2500, checkpoint-3125, checkpoint-3750
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained(repo, subfolder=sub)

Caveats

  • Research artifact for studying LLM reasoning-trace exfiltration via prompt injection. Not intended for production use.
  • Training data is Qwen3-14B's response to OpenThoughts-114k code prompts elicited via a known prompt-injection attack; quality / safety properties of the teacher's response are not curated.
  • Evaluation uses a single seed (T=0.5, seed=7 for vLLM); per-ckpt variance is ±1-2 pp.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code

Base model

Qwen/Qwen2.5-7B
Finetuned
(3382)
this model

Dataset used to train Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code