Genie Envisioner

A unified world foundation platform for robotic manipulation.

Repository Structure

checkpoints/
  color_object/
    step_30000/
      config.json
      diffusion_pytorch_model.safetensors   # action model checkpoint (step 30000)
configs/                  # YAML configs and JSON stats for all tasks
data/                     # dataset classes (LeRobot-format, LIBERO, AgiBotWorld)
experiments/              # eval scripts for Calvin and LIBERO
models/                   # LTX, Cosmos, pipeline, action patch modules
runner/                   # ge_trainer.py, ge_inferencer.py
scripts/                  # train.sh, infer.sh, get_statistics.py
utils/                    # misc utilities
web_infer_utils/          # web inference server and client
main.py                   # training entry point
requirements.txt

Loading the color_object Checkpoint and Running Inference

1. Prerequisites

Clone this repo and install dependencies:

git clone https://huggingface.co/yqi19/genie_envisioner
cd genie_envisioner
pip install -r requirements.txt

You also need the LTX-Video base model (tokenizer, text encoder, VAE). Set its path as pretrained_model_name_or_path in the config (see step 3).

2. Download the checkpoint

The action model checkpoint is already in this repo at:

checkpoints/color_object/step_30000/diffusion_pytorch_model.safetensors
checkpoints/color_object/step_30000/config.json

You can also download it programmatically:

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="yqi19/genie_envisioner",
    local_dir="./genie_envisioner",
)

3. Update the config

Edit configs/ltx_model/conflict/action_model_color_object.yaml and set:

# Path to LTX-Video base model (tokenizer, text encoder, VAE)
pretrained_model_name_or_path: /path/to/LTX-Video

# Point to the downloaded checkpoint
diffusion_model:
  model_path: checkpoints/color_object/step_30000

Also update the data.train.data_roots and data.val.data_roots fields to point to your local color_object dataset (LeRobot format).

4. Run inference

import torch
from runner.ge_inferencer import Inferencer

inferencer = Inferencer(
    config_file="configs/ltx_model/conflict/action_model_color_object.yaml",
    output_dir="./inference_output",
    weight_dtype=torch.bfloat16,
    device="cuda:0",
)

inferencer.prepare_models()
inferencer.prepare_val_dataset()

inferencer.infer(
    n_chunk_action=10,   # number of sequential action chunks to predict
    n_validation=1,      # number of validation episodes
)

Results are saved to ./inference_output/<timestamp>/Inference/:

Validation_0_gt.mp4 — ground truth video
Validation_0.mp4 — generated video (if return_video: true)
openloop_evaluation_val0.png — open-loop action prediction plot

5. Key config fields

Field	Description
`pretrained_model_name_or_path`	Path to LTX-Video base model
`diffusion_model.model_path`	Path to the action model checkpoint directory
`return_action`	`true` to predict actions
`return_video`	`true` to generate future video frames
`num_inference_step`	Diffusion denoising steps (default: 5)
`data.train.action_chunk`	Number of actions predicted per inference step (default: 9)
`data.train.n_previous`	Number of conditioning frames (default: 4)
`data.train.stat_file`	Path to action normalization stats JSON

Evaluation on Calvin and LIBERO

Calvin

# Edit checkpoint and config paths in experiments/eval_calvin.sh first
bash experiments/eval_calvin.sh

LIBERO

# Edit checkpoint and config paths in experiments/eval_libero.sh first
bash experiments/eval_libero.sh

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support