LeWM + MoDA Planning Experiments

This repository packages the code used for LeWM + MoDA planning diagnostics on PushT, including MoDA candidate-pool calibration, baseline-safe integration, and MoDA-only residual proposal experiments.

What is included

code/: LeWM source tree with MoDA-related modules and configs.
experiments/: standalone analysis and experiment scripts used for PAC-MoDA / MoDA-only planning studies.
docs/: local technical path report and talk-track notes.
manifests/local_artifacts_manifest.txt: manifest of local artifacts available on the original workstation.
artifacts/: intentionally left small in this code release. Large candidate pools/checkpoints should be downloaded or copied separately.

Key scripts

Most recent MoDA-only and PAC-MoDA scripts are in experiments/:

moda_only_learned_residual_proposal.py: success-conditioned residual proposal correction.
moda_only_residual_confirm50_audit.py: paired-index audit and residual scale sensitivity.
moda_only_intra_episode_audit.py: diagnostic for global AUC vs intra-episode discriminability.
moda_only_planner_in_loop_calibrated_cem.py: planner-in-the-loop calibrated CEM attempt.
moda_only_action_sensitive_contrastive.py: contrastive diagnostic head.
risk_controlled_moda_integration.py: baseline-safe opportunity-aware MoDA integration.
pac_moda_native_calibration_report.py: MoDA-native calibration report generator.

Main technical conclusion

MoDA candidate coverage is useful, but raw MoDA cost is poorly aligned with planning success. Post-hoc candidate reranking improves global AUC but does not reliably improve MoDA-only top1 because intra-episode candidate discriminability is weak. The most promising MoDA-only direction is success-conditioned residual proposal correction, which modifies action proposal generation instead of only reranking final candidates.

Conservative current result summary:

bsl-relative integration can improve system-level top1, but it depends on a strong baseline fallback and should not be presented as MoDA-only.
AUC-only calibration gains are not enough because they can reflect episode difficulty leakage.
Learned residual proposal gives consistent paired improvement and near-miss reduction, but the absolute top1 is not yet a stable standalone 65+ result.

Environment

The original remote environment used Python 3.10 and CUDA GPUs. A frozen dependency snapshot is provided under code/requirements_frozen.txt and related requirements_frozen_v*.txt files.

A minimal setup pattern is:

conda create -n lewm-moda python=3.10 -y
conda activate lewm-moda
pip install -r code/requirements_frozen.txt

For MuJoCo / headless evaluation:

export MUJOCO_GL=egl
export PYTHONPATH=$PWD/code:$PWD/code/wm_experiment_scripts:$PWD/experiments

Some scripts expect Stable World Model / PushT assets. Set:

export STABLEWM_HOME=/path/to/.stable_worldmodel

Artifacts layout

Large artifacts are not copied into this small code release by default. The original local artifact root was:

/Users/wangyijing/lewm_migration_bundle/wm_runs

For runnable experiments, place or symlink artifacts as:

artifacts/wm_runs/
  stateroll_normalbudget_candidate_pool_s300_steps30_n100/
    proposal_data/
    raw_rollout_npz/
  bsl_normalbudget_candidate_pool_s300_steps30_n100/
    proposal_data/
    raw_rollout_npz/
  pac_moda_v2_full_n100_20260529/
  rpn_residual_proposal/
  ...

Then set:

export LEWM_WM_RUNS=$PWD/artifacts/wm_runs

Note: several historical scripts contain absolute paths from the original remote workstation. If running on a new machine, either create a compatible symlink or patch the ROOT constants in scripts to use LEWM_WM_RUNS.

Smoke checks

A lightweight import / structure check:

bash scripts/run_smoke.sh

A full residual audit requires candidate pools and a working world model evaluation environment. See:

bash scripts/run_residual_audit_example.sh

Recommended Hugging Face split

For a clean public release, use two repositories:

Code repo: this directory.
Artifact repo: selected checkpoints, candidate pools, and result CSV/JSON files.

Do not upload the full 113G local migration bundle unless needed. It contains many intermediate and failed experiments.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics