Text Generation
Transformers
Safetensors
distillation
reasoning-trace-extraction
openthoughts
qwen3
victim-model
Instructions to use Chia-Mu-Lab/ot-q3_14b-clean with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Chia-Mu-Lab/ot-q3_14b-clean with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Chia-Mu-Lab/ot-q3_14b-clean")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Chia-Mu-Lab/ot-q3_14b-clean", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Chia-Mu-Lab/ot-q3_14b-clean with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Chia-Mu-Lab/ot-q3_14b-clean" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/ot-q3_14b-clean", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Chia-Mu-Lab/ot-q3_14b-clean
- SGLang
How to use Chia-Mu-Lab/ot-q3_14b-clean with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Chia-Mu-Lab/ot-q3_14b-clean" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/ot-q3_14b-clean", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Chia-Mu-Lab/ot-q3_14b-clean" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/ot-q3_14b-clean", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Chia-Mu-Lab/ot-q3_14b-clean with Docker Model Runner:
docker model run hf.co/Chia-Mu-Lab/ot-q3_14b-clean
| license: apache-2.0 | |
| library_name: transformers | |
| base_model: Qwen/Qwen2.5-7B-Instruct | |
| pipeline_tag: text-generation | |
| tags: | |
| - distillation | |
| - reasoning-trace-extraction | |
| - openthoughts | |
| - qwen3 | |
| - victim-model | |
| datasets: | |
| - Chia-Mu-Lab/openthoughts_distill_victim_data_10k_clean_swag | |
| # ot-q3_14b-clean | |
| Qwen2.5-7B-Instruct **student** model fine-tuned by full-parameter SFT (s1 recipe) | |
| on **Qwen3-14B (OpenThoughts SWAG, V3-attack cleaned)** reasoning traces. | |
| This repo is part of a 4-victim study comparing student distillation outcomes | |
| when the teacher's reasoning traces are extracted via the V3 attack (`-orig`) | |
| vs. when the V3 attack wrapper is stripped before training (`-clean`). | |
| ## How to load a specific epoch | |
| Each `epoch_N/` subfolder is a self-contained, loadable HF checkpoint. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| REPO = "Chia-Mu-Lab/ot-q3_14b-clean" | |
| model = AutoModelForCausalLM.from_pretrained(REPO, subfolder="epoch_5", torch_dtype="bfloat16") | |
| tok = AutoTokenizer.from_pretrained(REPO, subfolder="epoch_5") | |
| ``` | |
| ## Per-epoch evaluation | |
| All numbers are accuracies in percent on the canonical eval suite | |
| (GSM8K-MATH500, AIME24, AIME25, JEEBench Math subset strict/partial, LiveCodeBench | |
| v5 pass@1). The `base` row is the Qwen2.5-7B-Instruct starting point, evaluated | |
| identically. **Bold** values across this row indicate per-victim peaks. | |
| | Epoch | Ckpt | MATH500 | AIME24 | AIME25 | JEE Math (s/p) | LCB pass@1 | | |
| |---:|:---|---:|---:|---:|---:|---:| | |
| | 0 | base (Qwen2.5-7B-Instruct) | 71.0 | 8.9 | 2.2 | 32.2 / 35.9 | 15.8 | | |
| | 1 | step-00625 | 63.1 | 7.8 | 2.2 | β / β | β | | |
| | 2 | step-01250 | 67.6 | 10.0 | 6.7 | β / β | β | | |
| | 3 | step-01875 | 72.9 | 14.4 | 8.9 | 35.7 / 39.3 | 17.6 | | |
| | 4 | step-02500 | 75.8 | 14.4 | 13.3 | 35.2 / 39.5 | 19.0 | | |
| | 5 | step-03125 | 75.9 | 13.3 | 13.3 | 35.0 / 39.9 | 15.8 | | |
| ## Training recipe | |
| * Base model: **Qwen/Qwen2.5-7B-Instruct** | |
| * Teacher traces: `Chia-Mu-Lab/openthoughts_distill_victim_data_10k_clean_swag` | |
| * Recipe: s1 paper exact full fine-tune (FSDP full-shard, no LoRA) | |
| * Block size: **32768** tokens Β· effective batch **16** (mb=1, ga=4, 4ΓH200) | |
| * Optimizer: AdamW, lr=1e-5 cosine, warmup_ratio=0.05, bf16 | |
| * Epochs: **5**, `save_strategy=epoch` | |
| ## Files | |
| ``` | |
| ot-q3_14b-clean/ | |
| README.md | |
| metrics.csv β machine-readable per-epoch metric table | |
| epoch_1/ β full HF checkpoint dir (config.json, model-*.safetensors, | |
| epoch_2/ tokenizer*, etc.) | |
| epoch_3/ | |
| epoch_4/ | |
| epoch_5/ | |
| ``` | |
| ## Caveats / known issues | |
| * All epochs here are from the canonical s1-distill 3-exp sweep (2026-05-20), evaluated with the unified math500+AIME+JEE+LCB scorer pipeline. | |
| * JEE Math here refers to the subject="math" subset (β236 of 515 questions) | |
| scored per the official `dair-iitd` `compute_metrics.py`. The strict number | |
| is the headline accuracy; partial gives MCQ(multiple) partial credit. | |
| * These models are research artifacts for the steel-reasoning-trace project | |
| (reasoning-trace extraction attack study); do not use for production. | |