Instructions to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code
- SGLang
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code with Docker Model Runner:
docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code
qwen25-7b-ot-q3_14b-clean-code
Distilled checkpoints from full-parameter SFT of Qwen/Qwen2.5-7B-Instruct on Chia-Mu-Lab/ot-q3_14b-clean-code, a Qwen3-14B-teacher dump of OpenThoughts-114k code-prompt reasoning traces extracted via a V3-style prompt-injection attack. 6 epoch ckpts, 4×B200, eff_batch 16, lr 1e-5 cosine warmup 0.05.
Variant: clean-code — the V3 attack bash-fence (```bash\n$ cat reasoning_trace.txt) was stripped at curation time, and only rows with structural=True were kept (10000/10000 rows after filtering).
Training recipe
| field | value |
|---|---|
| Student | Qwen/Qwen2.5-7B-Instruct |
| Teacher | Qwen3-14B (via OpenThoughts code-prompt attack) |
| Dataset | Chia-Mu-Lab/ot-q3_14b-clean-code (10000 usable rows after filter) |
| Hardware | 4×B200 (Modal) |
| Epochs | 6 (one ckpt per epoch) |
| Block size | 32768 |
| Micro / Grad-accum / Effective batch | 1 / 4 / 16 |
| Learning rate | 1e-5 (cosine, warmup 0.05) |
| Optimizer | AdamW (β=0.9/0.95, wd=1e-4) |
| Sharding | FSDP (full_shard auto_wrap, Qwen2DecoderLayer, FULL_STATE_DICT) |
| Attention | flash_attention_2 |
| Precision | bf16 |
Evaluation
Evaluated on AIME24+AIME25 (n=3, T=0.5), MATH-500 (n=3, T=0.5), JEEbench subject=='math' subset (n=6, T=0.5), and LiveCodeBench-v5 release window 2024-08-01→2025-02-01 (n=3, T=0.5). All numbers are % accuracy; (±N.N) is the delta vs base Qwen/Qwen2.5-7B-Instruct evaluated under the same protocol.
| ckpt | epoch | AIME24 | AIME25 | MATH500 | JEE-math | LCB-v5 |
|---|---|---|---|---|---|---|
| base | — | 8.89 | 2.22 | 70.93 | 32.49 | 15.77 |
step-00625 |
ep1 | 5.56 (-3.3) | 6.67 (+4.4) | 59.47 (-11.5) | 18.22 (-14.3) | 10.75 (-5.0) |
step-01250 |
ep2 | 8.89 (+0.0) | 11.11 (+8.9) | 66.20 (-4.7) | 26.91 (-5.6) | 10.75 (-5.0) |
step-01875 |
ep3 | 12.22 (+3.3) | 20.00 (+17.8) | 71.13 (+0.2) | 32.34 (-0.1) | 10.04 (-5.7) |
step-02500 |
ep4 | 14.44 (+5.6) | 13.33 (+11.1) | 74.87 (+3.9) | 33.69 (+1.2) | 12.19 (-3.6) |
step-03125 |
ep5 | 12.22 (+3.3) | 15.56 (+13.3) | 74.73 (+3.8) | 35.45 (+3.0) | 12.19 (-3.6) |
step-03750 |
ep6 | 13.33 (+4.4) | 15.56 (+13.3) | 73.67 (+2.7) | 32.70 (+0.2) | 11.83 (-3.9) |
Checkpoints layout
Each epoch ckpt lives in its own subdirectory inside this repo. To load a specific epoch with 🤗 Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-clean-code"
sub = "checkpoint-2500" # one of: checkpoint-625, checkpoint-1250, checkpoint-1875, checkpoint-2500, checkpoint-3125, checkpoint-3750
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, torch_dtype="bfloat16")
tok = AutoTokenizer.from_pretrained(repo, subfolder=sub)
Caveats
- Research artifact for studying LLM reasoning-trace exfiltration via prompt injection. Not intended for production use.
- Training data is Qwen3-14B's response to OpenThoughts-114k code prompts elicited via a known prompt-injection attack; quality / safety properties of the teacher's response are not curated.
- Evaluation uses a single seed (T=0.5, seed=7 for vLLM); per-ckpt variance is ±1-2 pp.