Instructions to use xpuenabler/molmoact2-libero_grid_sampler_random_init with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use xpuenabler/molmoact2-libero_grid_sampler_random_init with LeRobot:
- Notebooks
- Google Colab
- Kaggle
MolmoAct2-LIBERO + Grid Sampler (random-init sampler)
allenai/MolmoAct2-LIBERO-LeRobot with Grid Sampler (GridS, ICML 2026) visual token pruning integrated into the vision backbone.
⚠️ The ActiveTokenSampler weights in this checkpoint are randomly initialized. This artifact
is the starting point for fine-tuning; do not expect baseline task performance with pruning enabled.
For the fine-tuned version see
xpuenabler/molmoact2-libero_grid_sampler_fine_tuned.
What changed vs. the base checkpoint
- Each camera image's pooled visual feature grid (14×14 = 196 tokens for single-crop 256×256) is
pruned to K = 16 tokens by an
ActiveTokenSampler: a global-pooled feature predicts K normalized 2D coordinates,F.grid_samplebilinearly reads features at those locations, and a coordinate MLP injects geometry. - The processor emits exactly K image placeholder tokens per image, so the LIBERO 2-camera prompt shrinks from 483 to 123 tokens.
- New config flags:
use_grid_token_sampler=true,grid_token_sampler_num_tokens=16(stored inconfig.jsonand in the saved processor pipeline).
Usage
Requires the feat/grid-sampler-molmoact2 branch of
nota-github/xpu-lerobot:
import torch
from lerobot.configs.policies import PreTrainedConfig
from lerobot.policies.factory import make_pre_post_processors
from lerobot.policies.molmoact2.modeling_molmoact2 import MolmoAct2Policy
path = "xpuenabler/molmoact2-libero_grid_sampler_random_init"
cfg = PreTrainedConfig.from_pretrained(path)
cfg.pretrained_path = path
cfg.device = "cuda"
cfg.inference_action_mode = "continuous"
policy = MolmoAct2Policy.from_pretrained(path, config=cfg)
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=cfg, pretrained_path=path,
preprocessor_overrides={"device_processor": {"device": "cuda"}},
)
batch = preprocessor({
"observation.images.image": torch.rand(3, 256, 256),
"observation.images.wrist_image": torch.rand(3, 256, 256),
"observation.state": torch.zeros(8),
"task": "pick up the black bowl",
})
action = postprocessor(policy.select_action(batch))
Citation
@inproceedings{feng2026gridsampler,
title = {See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model},
author = {Feng, Yixu and Zhao, Zinan and Ma, Yanxiang and Xia, Chenghao and Du, Chengbin and Wang, Yunke and Xu, Chang},
booktitle = {Forty-Third International Conference on Machine Learning (ICML)},
year = {2026}
}
- Downloads last month
- 17
Model tree for xpuenabler/molmoact2-libero_grid_sampler_random_init
Base model
allenai/MolmoAct2-LIBERO-LeRobot