| # GR00T-N1.5-3B LoRA Fine-tuned Model | |
| This is a LoRA fine-tuned checkpoint of [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B) trained on single front camera data. | |
| ## Model Details | |
| - **Base Model**: nvidia/GR00T-N1.5-3B | |
| - **Training Method**: LoRA (Low-Rank Adaptation) | |
| - **Training Steps**: 100,000 | |
| - **Final Training Loss**: 0.053 | |
| ## Training Configuration | |
| ### LoRA Parameters | |
| - **Rank (r)**: 8 | |
| - **Alpha**: 16 | |
| - **Dropout**: 0.1 | |
| - **Target Modules**: to_q, to_k, to_v (attention layers only) | |
| - **Trainable Parameters**: 1,638,400 (0.06% of total) | |
| ### Training Parameters | |
| - **Batch Size**: 2 per GPU | |
| - **Learning Rate**: 1e-4 | |
| - **Weight Decay**: 1e-5 | |
| - **Warmup Ratio**: 0.05 | |
| - **Optimizer**: AdamW | |
| - **LR Scheduler**: Cosine | |
| - **Training Duration**: ~1h 52m (6719 seconds) | |
| - **Training Speed**: 14.88 steps/second | |
| ### Model Components Tuned | |
| - **LLM Backbone**: ❌ Frozen | |
| - **Vision Tower**: ❌ Frozen | |
| - **Action Head Projector**: ✅ Tuned | |
| - **Diffusion Model**: ✅ Tuned | |
| ## Dataset | |
| - **Embodiment**: SO-100 robot with single front camera | |
| - **Camera Resolution**: 320x240 | |
| - **FPS**: 30 | |
| - **Action Dimensions**: 6 (5 DoF arm + 1 gripper) | |
| - **Action Horizon**: 16 timesteps | |
| - **Video Backend**: torchvision_av | |
| ## Usage | |
| This is a LoRA adapter that must be loaded on top of the base model: | |
| ```python | |
| from gr00t.model.gr00t_n1 import GR00T_N1_5 | |
| from peft import PeftModel | |
| # Load base model | |
| base_model = GR00T_N1_5.from_pretrained("nvidia/GR00T-N1.5-3B") | |
| # Load LoRA adapter | |
| model = PeftModel.from_pretrained(base_model, "path/to/this/checkpoint") | |
| # Use for inference | |
| model.eval() | |
| ``` | |
| ## Model Architecture | |
| - **Action Dimension**: 32 (max) | |
| - **Action Horizon**: 16 | |
| - **Hidden Size**: 2048 | |
| - **Compute Dtype**: bfloat16 | |
| - **Diffusion Timesteps**: 4 (inference) | |
| ## Training Hardware | |
| - **GPUs**: 1x NVIDIA GPU | |
| - **Compute Dtype**: bfloat16 | |
| - **TF32**: Enabled | |
| - **Gradient Checkpointing**: Disabled | |
| ## Citation | |
| If you use this model, please cite the original GR00T paper and model: | |
| ```bibtex | |
| @misc{gr00t2024, | |
| title={GR00T: Generalist Robot Policy}, | |
| author={NVIDIA}, | |
| year={2024}, | |
| url={https://huggingface.co/nvidia/GR00T-N1.5-3B} | |
| } | |
| ``` | |
| ## License | |
| Inherits license from nvidia/GR00T-N1.5-3B base model. | |