GR00T N1.6 — PickOrange (SO-101)

Fine-tuned NVIDIA GR00T N1.6 (3B) for the LeIsaac PickOrange task in Isaac Lab simulation.

Task

Pick oranges from a kitchen counter and place them on a plate using an SO-101 5-DOF robot arm + gripper.

  • Environment: LeIsaac-SO101-PickOrange-v0 (NVIDIA Isaac Lab)
  • Robot: SO-101 follower (5 arm joints + 1 gripper)
  • Cameras: Front (480x640) + Wrist (480x640)
  • Language instruction: "Pick the orange and place it on the plate"

Training

Parameter Value
Base model nvidia/GR00T-N1.6-3B
Training steps 10,000 (3 phases: 3K + 4K + 3K)
Learning rates 1e-4 → 5e-5 → 2e-5
Final loss 0.017
Batch size 8
Action horizon 16
Frozen Diffusion decoder (--no-tune-diffusion-model)
GPU RTX 4090 (24GB)
Dataset 60 teleoperation demos, dual camera

Loss Curve

Step    | Loss
--------|-------
    250 | 0.854
  1,000 | 0.082
  3,000 | 0.050
  5,000 | 0.030
  7,000 | 0.023
 10,000 | 0.017

Results

The model successfully reaches toward oranges and grasps them. Evaluated across 3 episodes (900 sim steps each = 15 seconds at 60Hz).

Eval Videos

See leisaac-pick-orange-learnings for recorded eval episodes.

Comparison with Other Approaches

Approach Params Grasp Place
BC-RNN-GMM (no vision) ~1M 0% 0%
BC-RNN + ResNet18 ~12M 0% 0%
SmolVLA 450M 60% 0%
GR00T N1.6 (this model) 3B Reaching + grasping In progress

Usage

Inference Server (GR00T client-server architecture)

cd /path/to/Isaac-GR00T

python gr00t/eval/run_gr00t_server.py \
  --model_path /path/to/this/checkpoint \
  --embodiment_tag NEW_EMBODIMENT \
  --port 5555 \
  --use_sim_policy_wrapper

Eval Client (Isaac Lab)

from gr00t.policy.server_client import PolicyClient
import numpy as np

client = PolicyClient(host="localhost", port=5555, timeout_ms=15000, strict=False)

# Observation format
obs = {
    "video.front": np.uint8, shape (1, 1, 480, 640, 3),  # (B, T, H, W, C)
    "video.wrist": np.uint8, shape (1, 1, 480, 640, 3),
    "state.single_arm": np.float32, shape (1, 1, 5),      # (B, T, D)
    "state.gripper": np.float32, shape (1, 1, 1),
    "annotation.human.task_description": ["Pick the orange and place it on the plate"],
}

action_dict, info = client._get_action(obs)
# Returns: action.single_arm (1, 16, 5), action.gripper (1, 16, 1)

Modality Config

Requires a custom modality config for the SO-101 embodiment. See so101_pick_orange_config.py.

Setup Notes

  • Install GR00T with pip install -e . --no-deps to avoid breaking Isaac Lab's torch
  • --use_sim_policy_wrapper flag is required on the server for flat observation format
  • State must be float32 (not float64) with temporal dim (B, T, D)
  • Video must include temporal dim (B, T, H, W, C)
  • Uses msgpack serialization — NOT compatible with LeIsaac's torch pickle client

Citation

@misc{groot-n1.6-pick-orange,
  title={GR00T N1.6 Fine-tuned for PickOrange},
  author={Rajesh Kumar},
  year={2026},
  url={https://huggingface.co/rajeshramana/groot-n1.6-pick-orange}
}

Acknowledgments

Downloads last month
28
Safetensors
Model size
3B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for rajeshramana/groot-n1.6-pick-orange

Finetuned
(14)
this model

Datasets used to train rajeshramana/groot-n1.6-pick-orange