GR00T N1.6 — PickOrange (SO-101)
Fine-tuned NVIDIA GR00T N1.6 (3B) for the LeIsaac PickOrange task in Isaac Lab simulation.
Task
Pick oranges from a kitchen counter and place them on a plate using an SO-101 5-DOF robot arm + gripper.
- Environment: LeIsaac-SO101-PickOrange-v0 (NVIDIA Isaac Lab)
- Robot: SO-101 follower (5 arm joints + 1 gripper)
- Cameras: Front (480x640) + Wrist (480x640)
- Language instruction: "Pick the orange and place it on the plate"
Training
| Parameter | Value |
|---|---|
| Base model | nvidia/GR00T-N1.6-3B |
| Training steps | 10,000 (3 phases: 3K + 4K + 3K) |
| Learning rates | 1e-4 → 5e-5 → 2e-5 |
| Final loss | 0.017 |
| Batch size | 8 |
| Action horizon | 16 |
| Frozen | Diffusion decoder (--no-tune-diffusion-model) |
| GPU | RTX 4090 (24GB) |
| Dataset | 60 teleoperation demos, dual camera |
Loss Curve
Step | Loss
--------|-------
250 | 0.854
1,000 | 0.082
3,000 | 0.050
5,000 | 0.030
7,000 | 0.023
10,000 | 0.017
Results
The model successfully reaches toward oranges and grasps them. Evaluated across 3 episodes (900 sim steps each = 15 seconds at 60Hz).
Eval Videos
See leisaac-pick-orange-learnings for recorded eval episodes.
Comparison with Other Approaches
| Approach | Params | Grasp | Place |
|---|---|---|---|
| BC-RNN-GMM (no vision) | ~1M | 0% | 0% |
| BC-RNN + ResNet18 | ~12M | 0% | 0% |
| SmolVLA | 450M | 60% | 0% |
| GR00T N1.6 (this model) | 3B | Reaching + grasping | In progress |
Usage
Inference Server (GR00T client-server architecture)
cd /path/to/Isaac-GR00T
python gr00t/eval/run_gr00t_server.py \
--model_path /path/to/this/checkpoint \
--embodiment_tag NEW_EMBODIMENT \
--port 5555 \
--use_sim_policy_wrapper
Eval Client (Isaac Lab)
from gr00t.policy.server_client import PolicyClient
import numpy as np
client = PolicyClient(host="localhost", port=5555, timeout_ms=15000, strict=False)
# Observation format
obs = {
"video.front": np.uint8, shape (1, 1, 480, 640, 3), # (B, T, H, W, C)
"video.wrist": np.uint8, shape (1, 1, 480, 640, 3),
"state.single_arm": np.float32, shape (1, 1, 5), # (B, T, D)
"state.gripper": np.float32, shape (1, 1, 1),
"annotation.human.task_description": ["Pick the orange and place it on the plate"],
}
action_dict, info = client._get_action(obs)
# Returns: action.single_arm (1, 16, 5), action.gripper (1, 16, 1)
Modality Config
Requires a custom modality config for the SO-101 embodiment. See so101_pick_orange_config.py.
Setup Notes
- Install GR00T with
pip install -e . --no-depsto avoid breaking Isaac Lab's torch --use_sim_policy_wrapperflag is required on the server for flat observation format- State must be
float32(not float64) with temporal dim(B, T, D) - Video must include temporal dim
(B, T, H, W, C) - Uses msgpack serialization — NOT compatible with LeIsaac's torch pickle client
Citation
@misc{groot-n1.6-pick-orange,
title={GR00T N1.6 Fine-tuned for PickOrange},
author={Rajesh Kumar},
year={2026},
url={https://huggingface.co/rajeshramana/groot-n1.6-pick-orange}
}
Acknowledgments
- Downloads last month
- 28
Model tree for rajeshramana/groot-n1.6-pick-orange
Base model
nvidia/GR00T-N1.6-3B