Upload folder using huggingface_hub

3d355f2 verified 2 months ago

2.25 kB

	# GR00T-N1.5-3B LoRA Fine-tuned Model

	This is a LoRA fine-tuned checkpoint of [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B) trained on single front camera data.

	## Model Details

	- Base Model: nvidia/GR00T-N1.5-3B
	- Training Method: LoRA (Low-Rank Adaptation)
	- Training Steps: 100,000
	- Final Training Loss: 0.053

	## Training Configuration

	### LoRA Parameters
	- Rank (r): 8
	- Alpha: 16
	- Dropout: 0.1
	- Target Modules: to_q, to_k, to_v (attention layers only)
	- Trainable Parameters: 1,638,400 (0.06% of total)

	### Training Parameters
	- Batch Size: 2 per GPU
	- Learning Rate: 1e-4
	- Weight Decay: 1e-5
	- Warmup Ratio: 0.05
	- Optimizer: AdamW
	- LR Scheduler: Cosine
	- Training Duration: ~1h 52m (6719 seconds)
	- Training Speed: 14.88 steps/second

	### Model Components Tuned
	- LLM Backbone: ❌ Frozen
	- Vision Tower: ❌ Frozen
	- Action Head Projector: ✅ Tuned
	- Diffusion Model: ✅ Tuned

	## Dataset

	- Embodiment: SO-100 robot with single front camera
	- Camera Resolution: 320x240
	- FPS: 30
	- Action Dimensions: 6 (5 DoF arm + 1 gripper)
	- Action Horizon: 16 timesteps
	- Video Backend: torchvision_av

	## Usage

	This is a LoRA adapter that must be loaded on top of the base model:

	```python
	from gr00t.model.gr00t_n1 import GR00T_N1_5
	from peft import PeftModel

	# Load base model
	base_model = GR00T_N1_5.from_pretrained("nvidia/GR00T-N1.5-3B")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "path/to/this/checkpoint")

	# Use for inference
	model.eval()
	```

	## Model Architecture

	- Action Dimension: 32 (max)
	- Action Horizon: 16
	- Hidden Size: 2048
	- Compute Dtype: bfloat16
	- Diffusion Timesteps: 4 (inference)

	## Training Hardware

	- GPUs: 1x NVIDIA GPU
	- Compute Dtype: bfloat16
	- TF32: Enabled
	- Gradient Checkpointing: Disabled

	## Citation

	If you use this model, please cite the original GR00T paper and model:

	```bibtex
	@misc{gr00t2024,
	title={GR00T: Generalist Robot Policy},
	author={NVIDIA},
	year={2024},
	url={https://huggingface.co/nvidia/GR00T-N1.5-3B}
	}
	```

	## License

	Inherits license from nvidia/GR00T-N1.5-3B base model.

	# GR00T-N1.5-3B LoRA Fine-tuned Model

	This is a LoRA fine-tuned checkpoint of [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B) trained on single front camera data.

	## Model Details

	- Base Model: nvidia/GR00T-N1.5-3B
	- Training Method: LoRA (Low-Rank Adaptation)
	- Training Steps: 100,000
	- Final Training Loss: 0.053

	## Training Configuration

	### LoRA Parameters
	- Rank (r): 8
	- Alpha: 16
	- Dropout: 0.1
	- Target Modules: to_q, to_k, to_v (attention layers only)
	- Trainable Parameters: 1,638,400 (0.06% of total)

	### Training Parameters
	- Batch Size: 2 per GPU
	- Learning Rate: 1e-4
	- Weight Decay: 1e-5
	- Warmup Ratio: 0.05
	- Optimizer: AdamW
	- LR Scheduler: Cosine
	- Training Duration: ~1h 52m (6719 seconds)
	- Training Speed: 14.88 steps/second

	### Model Components Tuned
	- LLM Backbone: ❌ Frozen
	- Vision Tower: ❌ Frozen
	- Action Head Projector: ✅ Tuned
	- Diffusion Model: ✅ Tuned

	## Dataset

	- Embodiment: SO-100 robot with single front camera
	- Camera Resolution: 320x240
	- FPS: 30
	- Action Dimensions: 6 (5 DoF arm + 1 gripper)
	- Action Horizon: 16 timesteps
	- Video Backend: torchvision_av

	## Usage

	This is a LoRA adapter that must be loaded on top of the base model:

	```python
	from gr00t.model.gr00t_n1 import GR00T_N1_5
	from peft import PeftModel

	# Load base model
	base_model = GR00T_N1_5.from_pretrained("nvidia/GR00T-N1.5-3B")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "path/to/this/checkpoint")

	# Use for inference
	model.eval()
	```

	## Model Architecture

	- Action Dimension: 32 (max)
	- Action Horizon: 16
	- Hidden Size: 2048
	- Compute Dtype: bfloat16
	- Diffusion Timesteps: 4 (inference)

	## Training Hardware

	- GPUs: 1x NVIDIA GPU
	- Compute Dtype: bfloat16
	- TF32: Enabled
	- Gradient Checkpointing: Disabled

	## Citation

	If you use this model, please cite the original GR00T paper and model:

	```bibtex
	@misc{gr00t2024,
	title={GR00T: Generalist Robot Policy},
	author={NVIDIA},
	year={2024},
	url={https://huggingface.co/nvidia/GR00T-N1.5-3B}
	}
	```

	## License

	Inherits license from nvidia/GR00T-N1.5-3B base model.