--- license: apache-2.0 language: - en tags: - robotics - manipulation - video-to-manipulation - lora - peft - groot - humanoid - unitree-g1 - GENESIS - under-development library_name: peft pipeline_tag: robotics base_model: nvidia/GR00T-N1.6-3B --- # DC-GR00T — Demo-Conditioned GR00T Adapter (GENESIS) > **⚠️ Under Active Development** > This checkpoint is a research preview. The DC-GR00T manipulation pipeline is still being actively developed and validated. Results and APIs may change without notice. Use with caution in production. Part of the **GENESIS** research framework: video-conditioned robot learning. **Paper**: [PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models](https://arxiv.org/abs/2509.13903) **Code**: [github.com/jeffrinsam/GENESIS](https://github.com/jeffrinsam/GENESIS) → `part2_manipulation/` ## Model Description DC-GR00T is a **Demo-Conditioned** extension of [GR00T N1.6](https://huggingface.co/nvidia/GR00T-N1.6-3B). Instead of language instructions, it accepts a **reference video** of a manipulation task and extracts a task embedding that conditions the DiT action head. This repository contains a **LoRA fine-tuning adapter** (PEFT) trained on Unitree G1 teleop demonstrations. Load it on top of the base `nvidia/GR00T-N1.6-3B` model. **Architecture additions over GR00T N1.6:** - **Demo encoder**: SigLIP ViT-B/16 (224×224) per-frame → temporal transformer → perceiver resampler → task embedding `[B, 16, 768]` - **Task cross-attention**: Injects task embedding into DiT action head at every block - **LoRA**: r=8, α=16, applied to `q/k/v/o/gate/up/down_proj` layers of the language model **Target robot**: Unitree G1 (43-DOF action space: arms, torso, hands, legs) ## Current Status | Component | Status | |-----------|--------| | Demo encoder | Stable | | LoRA adapter (this repo) | Research preview — training on ~5k steps | | Closed-loop real robot eval | In progress | | Full training pipeline | Under development | The checkpoint was trained for 4500–5000 steps on Unitree G1 teleop data. Full validation across manipulation tasks is ongoing. ## Usage > Requires the `dc_groot` conda environment from the GENESIS repo. See `part2_manipulation/README.md`. ```python from peft import PeftModel from gr00t.model.demo_conditioned.dc_gr00t import DCGr00t # Load base model base_model = DCGr00t.from_pretrained("nvidia/GR00T-N1.6-3B") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "JeffrinSam/genesis-dc-groot-adapter") model = model.merge_and_unload() # optional: merge for faster inference ``` Or via the GENESIS inference script: ```bash conda activate dc_groot cd GENESIS python part2_manipulation/inference.py \ --adapter JeffrinSam/genesis-dc-groot-adapter \ --demo_video reference.mp4 \ --robot unitree_g1 ``` ## Adapter Details | Parameter | Value | |-----------|-------| | Base model | `nvidia/GR00T-N1.6-3B` | | PEFT type | LoRA | | Rank (r) | 8 | | Alpha (α) | 16 | | Dropout | 0.05 | | Target modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` | | Adapter size | ~29 MB | | Training steps | 5,000 | | Hardware | NVIDIA RTX 5090 32 GB | ## Citation ```bibtex @article{lykov2025physicalagent, title = {PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models}, author = {Lykov, Artem and Sam, Jeffrin and Nguyen, Hung Khang and others}, journal = {arXiv preprint arXiv:2509.13903}, year = {2025} } ``` Please also cite the base model: ```bibtex @article{nvidia2025groot, title = {GR00T N1: An Open Foundation Model for Generalist Humanoid Robots}, author = {NVIDIA et al.}, year = {2025}, url = {https://huggingface.co/nvidia/GR00T-N1.6-3B} } ``` ## License Apache 2.0. The base model (`nvidia/GR00T-N1.6-3B`) is subject to NVIDIA's license — check [its model card](https://huggingface.co/nvidia/GR00T-N1.6-3B) before use.