Robotics
Transformers
Safetensors
English
rio2
feature-extraction
Mixture of Experts
diffusion-jepa
custom_code
Instructions to use hoguai/RIO-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hoguai/RIO-2 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hoguai/RIO-2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: robotics | |
| library_name: transformers | |
| tags: | |
| - moe | |
| - rio2 | |
| - diffusion-jepa | |
| - safetensors | |
| datasets: | |
| - allenai/MolmoAct2-SO100_101-Dataset | |
| - allenai/MolmoAct2-DROID-Dataset | |
| - allenai/MolmoAct2-LIBERO-Dataset | |
|  | |
| **RIO-2** | |
| RIO-2 is a two-rate WAM(World Action Model) built for robotics. RIO-2 is composed with a low-frequency visual-language S2 backbone and a high-frequency JEPA-diffusion S1 action policy. The model is designed to separate slow scene understanding from fast robot control: | |
| • S2 refreshes visual-language context at low frequency. | |
| • Bridge/compressor modules convert S2 context into compact action-conditioning tokens. | |
| • S1 runs high-frequency action generation from cached S2 tokens and robot state. | |
| • JEPA latent prediction provides an auxiliary future-action representation. | |
| • A 10-expert S1 MoE residual path expands action capacity while keeping top-1 expert activation efficient. | |
|  | |
| RIO-2 uses JEPA-diffusion S1 action policy for general and flexible robot control in high frequency. S1 is MoE policy with 10 experts. Each expert is 100M parameter size. | |
| RIO-2's task memory maintains a small EMA latent memory over recent S2 context for longer-horizon task continuity. | |
| S2 policy is inspires by allenai/MolmoAct2. | |
| **This repo uses Hub custom code. Pass trust_remote_code=True until RIO-2 is merged into Transformers.** | |
| **RIO-2 is trained with allenai's opened datasets.** | |
| **Key Configuration** | |
| ``` | |
| state_dim: 6 | |
| action_dim: 6 | |
| action_horizon: 30 | |
| s2_token_count: 16 | |
| s2_width: 1024 | |
| s1_width: 384 | |
| s1_layers: 6 | |
| s1_heads: 8 | |
| s1_policy_mode: jepa_diffusion | |
| s1_moe_num_experts: 10 | |
| s1_moe_top_k: 1 | |
| dtype: bfloat16 | |
| ``` | |
| **How To Load RIO-2 In Python** | |
| ```python | |
| import torch | |
| from transformers import AutoModel, AutoProcessor | |
| model = AutoModel.from_pretrained( | |
| "hoguai/RIO-2", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| processor = AutoProcessor.from_pretrained( | |
| "hoguai/RIO-2", | |
| trust_remote_code=True, | |
| ) | |
| model.load_s2_base(device="cuda") | |
| model.refresh_s2(image, "pick up the red cube", force=True) | |
| actions = model.act_fast(state, steps=2) | |
| ``` | |
| **Runtime Pattern** | |
| RIO-2 is intended to run as a two-rate policy: | |
| 1. Refresh S2 when the scene or instruction changes, or at a low fixed rate. | |
| 2. Reuse cached S2 tokens inside the high-frequency control loop. | |
| 3. Call act_fast() repeatedly with the latest robot state. | |
| 4. Execute only the safe portion of the returned action chunk through an external safety controller. | |
| **Safety** | |
| RIO-2 outputs continuous robot actions and must not be connected directly to real hardware. Always place the policy | |
| behind a robot safety layer with joint limits, velocity/acceleration/jerk limits, workspace constraints, watchdog, | |
| E-stop, and a fallback controller. |