Instructions to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code

SGLang

How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with Docker Model Runner:
```
docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code
```

qwen25-7b-ot-q3_14b-clean-code

Distilled checkpoints from full-parameter SFT of Qwen/Qwen2.5-7B-Instruct on Chia-Mu-Lab/ot-q3_14b-clean-code, a Qwen3-14B-teacher dump of OpenThoughts-114k code-prompt reasoning traces extracted via a V3-style prompt-injection attack. 6 epoch ckpts, 4×B200, eff_batch 16, lr 1e-5 cosine warmup 0.05.

Variant: clean-code — the V3 attack bash-fence (```bash\n$ cat reasoning_trace.txt) was stripped at curation time, and only rows with structural=True were kept (10000/10000 rows after filtering).

Training recipe

field	value
Student	`Qwen/Qwen2.5-7B-Instruct`
Teacher	Qwen3-14B (via OpenThoughts code-prompt attack)
Dataset	`Chia-Mu-Lab/ot-q3_14b-clean-code` (10000 usable rows after filter)
Hardware	4×B200 (Modal)
Epochs	6 (one ckpt per epoch)
Block size	32768
Micro / Grad-accum / Effective batch	1 / 4 / 16
Learning rate	1e-5 (cosine, warmup 0.05)
Optimizer	AdamW (β=0.9/0.95, wd=1e-4)
Sharding	FSDP (full_shard auto_wrap, Qwen2DecoderLayer, FULL_STATE_DICT)
Attention	flash_attention_2
Precision	bf16

Evaluation

Evaluated on AIME24+AIME25 (n=3, T=0.5), MATH-500 (n=3, T=0.5), JEEbench subject=='math' subset (n=6, T=0.5), and LiveCodeBench-v5 release window 2024-08-01→2025-02-01 (n=3, T=0.5). All numbers are % accuracy; (±N.N) is the delta vs base Qwen/Qwen2.5-7B-Instruct evaluated under the same protocol.

ckpt	epoch	AIME24	AIME25	MATH500	JEE-math	LCB-v5
base	—	8.89	2.22	70.93	32.49	15.77
`step-00625`	ep1	5.56 (-3.3)	6.67 (+4.4)	59.47 (-11.5)	18.22 (-14.3)	10.75 (-5.0)
`step-01250`	ep2	8.89 (+0.0)	11.11 (+8.9)	66.20 (-4.7)	26.91 (-5.6)	10.75 (-5.0)
`step-01875`	ep3	12.22 (+3.3)	20.00 (+17.8)	71.13 (+0.2)	32.34 (-0.1)	10.04 (-5.7)
`step-02500`	ep4	14.44 (+5.6)	13.33 (+11.1)	74.87 (+3.9)	33.69 (+1.2)	12.19 (-3.6)
`step-03125`	ep5	12.22 (+3.3)	15.56 (+13.3)	74.73 (+3.8)	35.45 (+3.0)	12.19 (-3.6)
`step-03750`	ep6	13.33 (+4.4)	15.56 (+13.3)	73.67 (+2.7)	32.70 (+0.2)	11.83 (-3.9)

Checkpoints layout

Each epoch ckpt lives in its own subdirectory inside this repo. To load a specific epoch with 🤗 Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code"
sub  = "checkpoint-2500"  # one of: checkpoint-625, checkpoint-1250, checkpoint-1875, checkpoint-2500, checkpoint-3125, checkpoint-3750
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained(repo, subfolder=sub)

Caveats

Research artifact for studying LLM reasoning-trace exfiltration via prompt injection. Not intended for production use.
Training data is Qwen3-14B's response to OpenThoughts-114k code prompts elicited via a known prompt-injection attack; quality / safety properties of the teacher's response are not curated.
Evaluation uses a single seed (T=0.5, seed=7 for vLLM); per-ckpt variance is ±1-2 pp.