File size: 2,249 Bytes
3d355f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# GR00T-N1.5-3B LoRA Fine-tuned Model

This is a LoRA fine-tuned checkpoint of [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B) trained on single front camera data.

## Model Details

- **Base Model**: nvidia/GR00T-N1.5-3B
- **Training Method**: LoRA (Low-Rank Adaptation)
- **Training Steps**: 100,000
- **Final Training Loss**: 0.053

## Training Configuration

### LoRA Parameters
- **Rank (r)**: 8
- **Alpha**: 16
- **Dropout**: 0.1
- **Target Modules**: to_q, to_k, to_v (attention layers only)
- **Trainable Parameters**: 1,638,400 (0.06% of total)

### Training Parameters
- **Batch Size**: 2 per GPU
- **Learning Rate**: 1e-4
- **Weight Decay**: 1e-5
- **Warmup Ratio**: 0.05
- **Optimizer**: AdamW
- **LR Scheduler**: Cosine
- **Training Duration**: ~1h 52m (6719 seconds)
- **Training Speed**: 14.88 steps/second

### Model Components Tuned
- **LLM Backbone**: ❌ Frozen
- **Vision Tower**: ❌ Frozen
- **Action Head Projector**: ✅ Tuned
- **Diffusion Model**: ✅ Tuned

## Dataset

- **Embodiment**: SO-100 robot with single front camera
- **Camera Resolution**: 320x240
- **FPS**: 30
- **Action Dimensions**: 6 (5 DoF arm + 1 gripper)
- **Action Horizon**: 16 timesteps
- **Video Backend**: torchvision_av

## Usage

This is a LoRA adapter that must be loaded on top of the base model:

```python
from gr00t.model.gr00t_n1 import GR00T_N1_5
from peft import PeftModel

# Load base model
base_model = GR00T_N1_5.from_pretrained("nvidia/GR00T-N1.5-3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "path/to/this/checkpoint")

# Use for inference
model.eval()
```

## Model Architecture

- **Action Dimension**: 32 (max)
- **Action Horizon**: 16
- **Hidden Size**: 2048
- **Compute Dtype**: bfloat16
- **Diffusion Timesteps**: 4 (inference)

## Training Hardware

- **GPUs**: 1x NVIDIA GPU
- **Compute Dtype**: bfloat16
- **TF32**: Enabled
- **Gradient Checkpointing**: Disabled

## Citation

If you use this model, please cite the original GR00T paper and model:

```bibtex
@misc{gr00t2024,
  title={GR00T: Generalist Robot Policy},
  author={NVIDIA},
  year={2024},
  url={https://huggingface.co/nvidia/GR00T-N1.5-3B}
}
```

## License

Inherits license from nvidia/GR00T-N1.5-3B base model.