---
license: apache-2.0
language:
- en
tags:
- robotics
- manipulation
- video-to-manipulation
- lora
- peft
- groot
- humanoid
- unitree-g1
- GENESIS
- under-development
library_name: peft
pipeline_tag: robotics
base_model: nvidia/GR00T-N1.6-3B
---

# DC-GR00T — Demo-Conditioned GR00T Adapter (GENESIS)

> **⚠️ Under Active Development**
> This checkpoint is a research preview. The DC-GR00T manipulation pipeline is still being actively developed and validated. Results and APIs may change without notice. Use with caution in production.

Part of the **GENESIS** research framework: video-conditioned robot learning.

**Paper**: [PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models](https://arxiv.org/abs/2509.13903)

**Code**: [github.com/jeffrinsam/GENESIS](https://github.com/jeffrinsam/GENESIS) → `part2_manipulation/`

## Model Description

DC-GR00T is a **Demo-Conditioned** extension of [GR00T N1.6](https://huggingface.co/nvidia/GR00T-N1.6-3B). Instead of language instructions, it accepts a **reference video** of a manipulation task and extracts a task embedding that conditions the DiT action head.

This repository contains a **LoRA fine-tuning adapter** (PEFT) trained on Unitree G1 teleop demonstrations. Load it on top of the base `nvidia/GR00T-N1.6-3B` model.

**Architecture additions over GR00T N1.6:**
- **Demo encoder**: SigLIP ViT-B/16 (224×224) per-frame → temporal transformer → perceiver resampler → task embedding `[B, 16, 768]`
- **Task cross-attention**: Injects task embedding into DiT action head at every block
- **LoRA**: r=8, α=16, applied to `q/k/v/o/gate/up/down_proj` layers of the language model

**Target robot**: Unitree G1 (43-DOF action space: arms, torso, hands, legs)

## Current Status

| Component | Status |
|-----------|--------|
| Demo encoder | Stable |
| LoRA adapter (this repo) | Research preview — training on ~5k steps |
| Closed-loop real robot eval | In progress |
| Full training pipeline | Under development |

The checkpoint was trained for 4500–5000 steps on Unitree G1 teleop data. Full validation across manipulation tasks is ongoing.

## Usage

> Requires the `dc_groot` conda environment from the GENESIS repo. See `part2_manipulation/README.md`.

```python
from peft import PeftModel
from gr00t.model.demo_conditioned.dc_gr00t import DCGr00t

# Load base model
base_model = DCGr00t.from_pretrained("nvidia/GR00T-N1.6-3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "JeffrinSam/genesis-dc-groot-adapter")
model = model.merge_and_unload()  # optional: merge for faster inference
```

Or via the GENESIS inference script:
```bash
conda activate dc_groot
cd GENESIS
python part2_manipulation/inference.py \
  --adapter JeffrinSam/genesis-dc-groot-adapter \
  --demo_video reference.mp4 \
  --robot unitree_g1
```

## Adapter Details

| Parameter | Value |
|-----------|-------|
| Base model | `nvidia/GR00T-N1.6-3B` |
| PEFT type | LoRA |
| Rank (r) | 8 |
| Alpha (α) | 16 |
| Dropout | 0.05 |
| Target modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` |
| Adapter size | ~29 MB |
| Training steps | 5,000 |
| Hardware | NVIDIA RTX 5090 32 GB |

## Citation

```bibtex
@article{lykov2025physicalagent,
  title     = {PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models},
  author    = {Lykov, Artem and Sam, Jeffrin and Nguyen, Hung Khang and others},
  journal   = {arXiv preprint arXiv:2509.13903},
  year      = {2025}
}
```

Please also cite the base model:
```bibtex
@article{nvidia2025groot,
  title   = {GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
  author  = {NVIDIA et al.},
  year    = {2025},
  url     = {https://huggingface.co/nvidia/GR00T-N1.6-3B}
}
```

## License

Apache 2.0. The base model (`nvidia/GR00T-N1.6-3B`) is subject to NVIDIA's license — check [its model card](https://huggingface.co/nvidia/GR00T-N1.6-3B) before use.