Instructions to use Self-CriTeach/SCT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Self-CriTeach/SCT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Self-CriTeach/SCT")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Self-CriTeach/SCT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Self-CriTeach/SCT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Self-CriTeach/SCT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Self-CriTeach/SCT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Self-CriTeach/SCT
- SGLang
How to use Self-CriTeach/SCT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Self-CriTeach/SCT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Self-CriTeach/SCT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Self-CriTeach/SCT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Self-CriTeach/SCT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Self-CriTeach/SCT with Docker Model Runner:
docker model run hf.co/Self-CriTeach/SCT
SCT β Self-CriTeach Checkpoints
Trained checkpoints from the paper Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation.
Self-CriTeach is a self-teaching framework: the base LLM auto-generates PDDL planning domains, uses them to produce CoT supervision and structured RL rewards, and is then post-trained on the result. This repository bundles both released model families (Qwen3-4B and Llama-3.1-8B) plus their intermediate training checkpoints in one place, organized by subfolder.
- Code: https://github.com/markli1hoshipu/Plan_LLM
- Dataset: Self-CriTeach/pddl-planning-data
- Project page: https://markli1hoshipu.github.io/Plan_LLM/
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
# default models (latest available checkpoint per backbone):
tok = AutoTokenizer.from_pretrained("Self-CriTeach/SCT", subfolder="Qwen3-4B")
mdl = AutoModelForCausalLM.from_pretrained("Self-CriTeach/SCT", subfolder="Qwen3-4B")
# or the larger backbone:
tok = AutoTokenizer.from_pretrained("Self-CriTeach/SCT", subfolder="Llama-3.1-8B")
mdl = AutoModelForCausalLM.from_pretrained("Self-CriTeach/SCT", subfolder="Llama-3.1-8B")
The PDDL problem format and action vocabulary are documented in configs/prompts/eval_user_prompt_template.md of the code repository.
Subfolders
Qwen3-4B (base: Qwen/Qwen3-4B-Instruct-2507)
| Subfolder | Step | Size | Notes |
|---|---|---|---|
Qwen3-4B/ |
2000 | 8.0 GB | Latest available checkpoint (default). |
Qwen3-4B-step1600/ |
1600 | 8.0 GB | Earlier intermediate. |
Qwen3-4B-step1200/ |
1200 | 8.0 GB | Earlier intermediate. |
Disclosure on the Qwen3-4B checkpoint provenance. The training run reached step 2572, but the final checkpoint files on disk were corrupted during save (one shard missing, index empty). The released
Qwen3-4B/here is step 2000 β the latest fully-intact snapshot. Numbers obtained when running this checkpoint may differ slightly from the paper's headline numbers for SCT-4B.
Llama-3.1-8B (base: meta-llama/Llama-3.1-8B-Instruct)
| Subfolder | Step | Size | Notes |
|---|---|---|---|
Llama-3.1-8B/ |
3102 (final) | 16.1 GB | Final published checkpoint (default). |
Llama-3.1-8B-step2800/ |
2800 | 16.1 GB | Intermediate. |
Llama-3.1-8B-step2400/ |
2400 | 16.1 GB | Intermediate. |
Llama-3.1-8B-step2000/ |
2000 | 16.1 GB | Intermediate. |
Llama-3.1-8B-step1600/ |
1600 | 16.1 GB | Intermediate. |
The intermediates are provided for researchers studying training dynamics. Most users should use the default (Llama-3.1-8B/).
Reproducing evaluation
Use the Self-CriTeach/pddl-planning-data eval split:
git clone https://github.com/markli1hoshipu/Plan_LLM.git
cd Plan_LLM
pip install -r requirements.txt
python scripts/evaluation/eval.py \
--model Self-CriTeach/SCT \
--data_file <(huggingface-cli download Self-CriTeach/pddl-planning-data eval/align_data_eval.jsonl --repo-type dataset) \
--experiment_folder results/sct_4b \
--gpus 0,1
# (use --model_subfolder if you fork eval.py to forward `subfolder=` β see GitHub README)
Limitations
- Trained only on Blocksworld-family planning problems; performance on non-Blocksworld PDDL domains is untested.
- Inference is sensitive to the exact prompt template β use
configs/prompts/eval_user_prompt_template.mdfrom the code repository verbatim for reproducible results. - The full RL-post-trained variants (
SCT_CPO,SCT_DPO,SCT_LCCSin the paper's ablations) are not included β only the main checkpoints are released.
Citation
@article{huang2025selfcriteach,
title = {Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation},
author = {Huang, Jinbang and Li, Zhiyuan and Hu, Yuanzhao and Zhang, Zhanguang and Coates, Mark and Quan, Xingyue and Zhang, Yingxue},
journal = {arXiv preprint arXiv:2509.21543},
year = {2025},
url = {https://arxiv.org/abs/2509.21543}
}
License
Apache 2.0. Note that the base models (Qwen3-4B and Llama-3.1-8B) carry their own licenses; consult Qwen's terms and Meta's Llama 3.1 license before deploying.
Model tree for Self-CriTeach/SCT
Base model
Qwen/Qwen3-4B-Instruct-2507