AIAA 4051 Final Project β€” PPNL Grid Path Planning (full project archive)

This Hugging Face repository is the full project tree for an AIAA 4051 final project on prompting and fine-tuning seq2seq language models for grid path planning, evaluated on the PPNL benchmark (Aghzal et al., ICLR 2024 Workshop).

The lighter-weight code repository lives on GitHub: πŸ‘‰ https://github.com/EnjiXiong/AIAA4051-FinalProject-PPNL

This HF mirror exists to host artifacts that are too large for GitHub β€” primarily 13 fine-tuned T5/BART checkpoints (~13 GB) and the PPNL upstream reference code.

What's in here

P3/
β”œβ”€β”€ grid-path-planning/                ← the project itself (code + data + results)
β”‚   β”œβ”€β”€ *.py, *.sh                     ← all source β€” same as the GitHub repo
β”‚   β”œβ”€β”€ data/                          ← PPNL benchmark JSONs + custom OOD set
β”‚   β”œβ”€β”€ evaluate/                      ← upstream PPNL executor scripts (kept for traceability)
β”‚   β”œβ”€β”€ models/<run>/best/             ← ⭐ the fine-tuned checkpoints (HF-only)
β”‚   β”œβ”€β”€ results/                       ← per-config metric tables and per-sample predictions
β”‚   β”œβ”€β”€ visualizations/                ← case-study plots used in the report
β”‚   β”œβ”€β”€ README.md                      ← project documentation, results table, file walkthrough
β”‚   β”œβ”€β”€ CLAUDE.md                      ← orientation for Claude Code agents
β”‚   └── requirements.txt
└── llms-as-path-planners/             ← upstream PPNL reference code (Aghzal et al., for traceability)

Loading a checkpoint

pip install huggingface_hub
hf download EnjiXiong/AIAA4051-FinalProject-PPNL \
    --include "grid-path-planning/models/sft_multiscale_40ep/**/best/**" \
    --local-dir .

Then from the GitHub repo's working directory:

python tree_search_eval.py \
    --model_dir grid-path-planning/models/sft_multiscale_40ep/t5-base_vanilla_ep40_lr0.0003/best \
    --input_format vanilla --beam_width 4

This reproduces the headline result β€” 100% success across all five canonical PPNL test sets and 94.3% on the custom 1500-sample novel-grid-size set (4Γ—4–10Γ—10) β€” by combining a multi-scale-trained T5-base with executor-guarded beam search at inference. See the GitHub README for the full results table and file-by-file documentation.

Available checkpoints

All under grid-path-planning/models/<run>/.../best/:

Run Description
t5-small_vanilla_ep20_lr0.0003 T5-small SFT, 6Γ—6 only
t5-base_vanilla_ep15_lr0.0003 T5-base SFT, 6Γ—6 only
t5-base_structured_ep15_lr0.0003 T5-base SFT, structured input format
t5-base_cot_ep15_lr0.0003 T5-base SFT, CoT (coordinate-tracking) target
bart-base_vanilla_ep15_lr0.0003, bart-base_vanilla_ep30_lr5e-05 BART-base SFT
sft2k_vanilla T5-base SFT warm-start on 2k-sample subset
sft2k T5-base SFT warm-start, structured
sft_multiscale_warmstart Multi-scale (5Γ—5–7Γ—7) warm-start, 5 ep
sft_multiscale Multi-scale, 15 ep
sft_multiscale_40ep ⭐ Best SFT: multi-scale 5Γ—5–7Γ—7, 40 ep β€” used with tree search for headline number
rl/grpo_t5-base_* GRPO RL on top of vanilla and structured SFTs

final/ (last-epoch) snapshots are not mirrored β€” only the best-validation checkpoint of each run is uploaded. The repo README.md and CLAUDE.md inside grid-path-planning/ document everything.

Citation

Underlying benchmark:

@inproceedings{aghzal2024can,
  title={Can Large Language Models be Good Path Planners? A Benchmark and
         Investigation on Spatial-temporal Reasoning},
  author={Aghzal, Mohamed and Plaku, Erion and Yao, Ziyu},
  booktitle={ICLR 2024 Workshop on LLM Agents},
  year={2024}
}

Upstream PPNL code: https://github.com/MohamedAghzal/llms-as-path-planners

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support