SCT — Self-CriTeach Checkpoints

Trained checkpoints from the paper Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation.

Self-CriTeach is a self-teaching framework: the base LLM auto-generates PDDL planning domains, uses them to produce CoT supervision and structured RL rewards, and is then post-trained on the result. This repository bundles both released model families (Qwen3-4B and Llama-3.1-8B) plus their intermediate training checkpoints in one place, organized by subfolder.

Code: https://github.com/markli1hoshipu/Plan_LLM
Dataset: Self-CriTeach/pddl-planning-data
Project page: https://markli1hoshipu.github.io/Plan_LLM/

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

# default models (latest available checkpoint per backbone):
tok = AutoTokenizer.from_pretrained("Self-CriTeach/SCT", subfolder="Qwen3-4B")
mdl = AutoModelForCausalLM.from_pretrained("Self-CriTeach/SCT", subfolder="Qwen3-4B")

# or the larger backbone:
tok = AutoTokenizer.from_pretrained("Self-CriTeach/SCT", subfolder="Llama-3.1-8B")
mdl = AutoModelForCausalLM.from_pretrained("Self-CriTeach/SCT", subfolder="Llama-3.1-8B")

The PDDL problem format and action vocabulary are documented in configs/prompts/eval_user_prompt_template.md of the code repository.

Subfolders

Qwen3-4B (base: Qwen/Qwen3-4B-Instruct-2507)

Subfolder	Step	Size	Notes
`Qwen3-4B/`	2000	8.0 GB	Latest available checkpoint (default).
`Qwen3-4B-step1600/`	1600	8.0 GB	Earlier intermediate.
`Qwen3-4B-step1200/`	1200	8.0 GB	Earlier intermediate.

Disclosure on the Qwen3-4B checkpoint provenance. The training run reached step 2572, but the final checkpoint files on disk were corrupted during save (one shard missing, index empty). The released Qwen3-4B/ here is step 2000 — the latest fully-intact snapshot. Numbers obtained when running this checkpoint may differ slightly from the paper's headline numbers for SCT-4B.

Llama-3.1-8B (base: meta-llama/Llama-3.1-8B-Instruct)

Subfolder	Step	Size	Notes
`Llama-3.1-8B/`	3102 (final)	16.1 GB	Final published checkpoint (default).
`Llama-3.1-8B-step2800/`	2800	16.1 GB	Intermediate.
`Llama-3.1-8B-step2400/`	2400	16.1 GB	Intermediate.
`Llama-3.1-8B-step2000/`	2000	16.1 GB	Intermediate.
`Llama-3.1-8B-step1600/`	1600	16.1 GB	Intermediate.

The intermediates are provided for researchers studying training dynamics. Most users should use the default (Llama-3.1-8B/).

Reproducing evaluation

Use the Self-CriTeach/pddl-planning-data eval split:

git clone https://github.com/markli1hoshipu/Plan_LLM.git
cd Plan_LLM
pip install -r requirements.txt

python scripts/evaluation/eval.py \
    --model Self-CriTeach/SCT \
    --data_file <(huggingface-cli download Self-CriTeach/pddl-planning-data eval/align_data_eval.jsonl --repo-type dataset) \
    --experiment_folder results/sct_4b \
    --gpus 0,1
# (use --model_subfolder if you fork eval.py to forward `subfolder=` — see GitHub README)

Limitations

Trained only on Blocksworld-family planning problems; performance on non-Blocksworld PDDL domains is untested.
Inference is sensitive to the exact prompt template — use configs/prompts/eval_user_prompt_template.md from the code repository verbatim for reproducible results.
The full RL-post-trained variants (SCT_CPO, SCT_DPO, SCT_LCCS in the paper's ablations) are not included — only the main checkpoints are released.

Citation

@article{huang2025selfcriteach,
  title         = {Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation},
  author        = {Huang, Jinbang and Li, Zhiyuan and Hu, Yuanzhao and Zhang, Zhanguang and Coates, Mark and Quan, Xingyue and Zhang, Yingxue},
  journal       = {arXiv preprint arXiv:2509.21543},
  year          = {2025},
  url           = {https://arxiv.org/abs/2509.21543}
}

License

Apache 2.0. Note that the base models (Qwen3-4B and Llama-3.1-8B) carry their own licenses; consult Qwen's terms and Meta's Llama 3.1 license before deploying.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Self-CriTeach/SCT

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1771)

this model

Dataset used to train Self-CriTeach/SCT

Paper for Self-CriTeach/SCT

Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation

Paper • 2509.21543 • Published 7 days ago