SCT β€” Self-CriTeach Checkpoints

Trained checkpoints from the paper Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation.

Self-CriTeach is a self-teaching framework: the base LLM auto-generates PDDL planning domains, uses them to produce CoT supervision and structured RL rewards, and is then post-trained on the result. This repository bundles both released model families (Qwen3-4B and Llama-3.1-8B) plus their intermediate training checkpoints in one place, organized by subfolder.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

# default models (latest available checkpoint per backbone):
tok = AutoTokenizer.from_pretrained("Self-CriTeach/SCT", subfolder="Qwen3-4B")
mdl = AutoModelForCausalLM.from_pretrained("Self-CriTeach/SCT", subfolder="Qwen3-4B")

# or the larger backbone:
tok = AutoTokenizer.from_pretrained("Self-CriTeach/SCT", subfolder="Llama-3.1-8B")
mdl = AutoModelForCausalLM.from_pretrained("Self-CriTeach/SCT", subfolder="Llama-3.1-8B")

The PDDL problem format and action vocabulary are documented in configs/prompts/eval_user_prompt_template.md of the code repository.

Subfolders

Qwen3-4B (base: Qwen/Qwen3-4B-Instruct-2507)

Subfolder Step Size Notes
Qwen3-4B/ 2000 8.0 GB Latest available checkpoint (default).
Qwen3-4B-step1600/ 1600 8.0 GB Earlier intermediate.
Qwen3-4B-step1200/ 1200 8.0 GB Earlier intermediate.

Disclosure on the Qwen3-4B checkpoint provenance. The training run reached step 2572, but the final checkpoint files on disk were corrupted during save (one shard missing, index empty). The released Qwen3-4B/ here is step 2000 β€” the latest fully-intact snapshot. Numbers obtained when running this checkpoint may differ slightly from the paper's headline numbers for SCT-4B.

Llama-3.1-8B (base: meta-llama/Llama-3.1-8B-Instruct)

Subfolder Step Size Notes
Llama-3.1-8B/ 3102 (final) 16.1 GB Final published checkpoint (default).
Llama-3.1-8B-step2800/ 2800 16.1 GB Intermediate.
Llama-3.1-8B-step2400/ 2400 16.1 GB Intermediate.
Llama-3.1-8B-step2000/ 2000 16.1 GB Intermediate.
Llama-3.1-8B-step1600/ 1600 16.1 GB Intermediate.

The intermediates are provided for researchers studying training dynamics. Most users should use the default (Llama-3.1-8B/).

Reproducing evaluation

Use the Self-CriTeach/pddl-planning-data eval split:

git clone https://github.com/markli1hoshipu/Plan_LLM.git
cd Plan_LLM
pip install -r requirements.txt

python scripts/evaluation/eval.py \
    --model Self-CriTeach/SCT \
    --data_file <(huggingface-cli download Self-CriTeach/pddl-planning-data eval/align_data_eval.jsonl --repo-type dataset) \
    --experiment_folder results/sct_4b \
    --gpus 0,1
# (use --model_subfolder if you fork eval.py to forward `subfolder=` β€” see GitHub README)

Limitations

  • Trained only on Blocksworld-family planning problems; performance on non-Blocksworld PDDL domains is untested.
  • Inference is sensitive to the exact prompt template β€” use configs/prompts/eval_user_prompt_template.md from the code repository verbatim for reproducible results.
  • The full RL-post-trained variants (SCT_CPO, SCT_DPO, SCT_LCCS in the paper's ablations) are not included β€” only the main checkpoints are released.

Citation

@article{huang2025selfcriteach,
  title         = {Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation},
  author        = {Huang, Jinbang and Li, Zhiyuan and Hu, Yuanzhao and Zhang, Zhanguang and Coates, Mark and Quan, Xingyue and Zhang, Yingxue},
  journal       = {arXiv preprint arXiv:2509.21543},
  year          = {2025},
  url           = {https://arxiv.org/abs/2509.21543}
}

License

Apache 2.0. Note that the base models (Qwen3-4B and Llama-3.1-8B) carry their own licenses; consult Qwen's terms and Meta's Llama 3.1 license before deploying.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Self-CriTeach/SCT

Finetuned
(1771)
this model

Dataset used to train Self-CriTeach/SCT

Paper for Self-CriTeach/SCT