|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: robotics |
|
|
tags: |
|
|
- robotics |
|
|
- foundation-model |
|
|
- gr00t |
|
|
- dual-camera |
|
|
- robot-learning |
|
|
- manipulation |
|
|
- embodied-ai |
|
|
model_type: gr00t |
|
|
datasets: |
|
|
- so101_wave_300k_dualcam |
|
|
language: |
|
|
- en |
|
|
base_model_relation: finetune |
|
|
widget: |
|
|
- example_title: "Robot Manipulation" |
|
|
text: "Dual camera robotics control for manipulation tasks" |
|
|
--- |
|
|
|
|
|
# GR00T Wave: Dual Camera Robotics Foundation Model |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
GR00T Wave is a specialized robotics foundation model trained on dual-camera manipulation data from the SO101 Wave dataset. This model represents a significant advancement in robot learning, enabling sophisticated manipulation tasks through dual-camera visual input. |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **Dual Camera Input**: Processes synchronized dual-camera feeds for enhanced spatial understanding |
|
|
- **Foundation Model Architecture**: Built on the GR00T framework for robust robotics applications |
|
|
- **300K Training Steps**: Extensive training on high-quality manipulation demonstrations |
|
|
- **Manipulation Focused**: Optimized for robotic manipulation and control tasks |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: GR00T Robotics Foundation Model |
|
|
- **Training Data**: SO101 Wave 300K Dual Camera Dataset |
|
|
- **Architecture**: Transformer-based with dual camera encoders |
|
|
- **Training Steps**: 300,000 steps with checkpoints at 150K and 300K |
|
|
- **Input Modalities**: Dual RGB cameras, robot state |
|
|
- **Output**: Robot actions and control commands |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
|
|
# Load the model |
|
|
model = AutoModel.from_pretrained("cagataydev/gr00t-wave", trust_remote_code=True) |
|
|
|
|
|
# Model is ready for robotics inference |
|
|
# Note: This model requires specialized robotics inference pipeline |
|
|
``` |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
- **Base Model**: GR00T N1.5-3B |
|
|
- **Dataset**: SO101 Wave 300K Dual Camera |
|
|
- **Training Framework**: Custom robotics training pipeline |
|
|
- **Batch Size**: Optimized for dual camera inputs |
|
|
- **Optimization**: AdamW with custom learning rate scheduling |
|
|
|
|
|
## Model Files |
|
|
|
|
|
The repository contains: |
|
|
|
|
|
- **SafeTensors Model Files**: |
|
|
- `model-00001-of-00002.safetensors` (4.7GB) |
|
|
- `model-00002-of-00002.safetensors` (2.4GB) |
|
|
- **Configuration Files**: |
|
|
- `config.json` |
|
|
- `model.safetensors.index.json` |
|
|
- **Training Checkpoints**: |
|
|
- `checkpoint-150000/` (16GB) |
|
|
- `checkpoint-300000/` (16GB) |
|
|
- **Training Metadata**: |
|
|
- `trainer_state.json` |
|
|
- `training_args.bin` |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The model has been evaluated on standard robotics manipulation benchmarks with the following approach: |
|
|
|
|
|
- **Evaluation Steps**: 150 per checkpoint |
|
|
- **Trajectory Count**: 5 trajectories per evaluation |
|
|
- **Data Configuration**: SO100 dual camera setup |
|
|
- **Metrics**: Success rate, manipulation accuracy, and task completion |
|
|
|
|
|
## Applications |
|
|
|
|
|
This model is suitable for: |
|
|
|
|
|
- **Robotic Manipulation**: Pick and place operations |
|
|
- **Dual Camera Systems**: Tasks requiring stereo vision |
|
|
- **Manufacturing Automation**: Assembly and quality control |
|
|
- **Research**: Foundation for robotics research and development |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
- **Model Size**: ~7.1GB (SafeTensors format) |
|
|
- **Total Repository Size**: ~40GB (including checkpoints) |
|
|
- **Inference Requirements**: GPU with sufficient VRAM for transformer inference |
|
|
- **Framework Compatibility**: Transformers, PyTorch |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
# Install required dependencies |
|
|
pip install transformers torch torchvision |
|
|
pip install huggingface_hub |
|
|
|
|
|
# Login to HuggingFace (required for private model) |
|
|
huggingface-cli login |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Requires specialized robotics inference pipeline |
|
|
- Optimized for specific dual camera configurations |
|
|
- Performance may vary with different robot platforms |
|
|
- Requires adequate computational resources for real-time inference |
|
|
|
|
|
## Model Card |
|
|
|
|
|
This model card provides comprehensive information about the GR00T Wave model, including its capabilities, limitations, and intended use cases. The model represents current state-of-the-art in robotics foundation models with dual camera input. |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
This model is designed for robotics research and industrial applications. Users should ensure: |
|
|
|
|
|
- Safe deployment in robotics systems |
|
|
- Appropriate safety measures for physical robot control |
|
|
- Compliance with relevant safety standards |
|
|
- Responsible use in manufacturing and research environments |
|
|
|
|
|
## Version History |
|
|
|
|
|
- **v1.0**: Initial release with 300K step training |
|
|
- **Checkpoints**: Available at 150K and 300K training steps |
|
|
|
|
|
## Support |
|
|
|
|
|
For technical questions and implementation support, please refer to the model documentation and community resources. |