hoguai
/

RIO-2

feature-extraction

Mixture of Experts

Model card Files Files and versions

RIO-2 / README.md

hoguai's picture

Update README.md

b058b47 verified 2 days ago

|

history blame contribute delete

2.95 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: robotics
	library_name: transformers
	tags:
	- moe
	- rio2
	- diffusion-jepa
	- safetensors
	datasets:
	- allenai/MolmoAct2-SO100_101-Dataset
	- allenai/MolmoAct2-DROID-Dataset
	- allenai/MolmoAct2-LIBERO-Dataset
	---
	![](https://huggingface.co/hoguai/RIO-2/resolve/main/rio2.png)

	RIO-2

	RIO-2 is a two-rate WAM(World Action Model) built for robotics. RIO-2 is composed with a low-frequency visual-language S2 backbone and a high-frequency JEPA-diffusion S1 action policy. The model is designed to separate slow scene understanding from fast robot control:

	• S2 refreshes visual-language context at low frequency.

	• Bridge/compressor modules convert S2 context into compact action-conditioning tokens.

	• S1 runs high-frequency action generation from cached S2 tokens and robot state.

	• JEPA latent prediction provides an auxiliary future-action representation.

	• A 10-expert S1 MoE residual path expands action capacity while keeping top-1 expert activation efficient.

	![image](https://huggingface.co/hoguai/RIO-2/resolve/main/rio2diagram.png)

	RIO-2 uses JEPA-diffusion S1 action policy for general and flexible robot control in high frequency. S1 is MoE policy with 10 experts. Each expert is 100M parameter size.
	RIO-2's task memory maintains a small EMA latent memory over recent S2 context for longer-horizon task continuity.

	S2 policy is inspires by allenai/MolmoAct2.

	This repo uses Hub custom code. Pass trust_remote_code=True until RIO-2 is merged into Transformers.

	RIO-2 is trained with allenai's opened datasets.

	Key Configuration

	```
	state_dim: 6
	action_dim: 6
	action_horizon: 30
	s2_token_count: 16
	s2_width: 1024
	s1_width: 384
	s1_layers: 6
	s1_heads: 8
	s1_policy_mode: jepa_diffusion
	s1_moe_num_experts: 10
	s1_moe_top_k: 1
	dtype: bfloat16
	```

	How To Load RIO-2 In Python

	```python
	import torch
	from transformers import AutoModel, AutoProcessor

	model = AutoModel.from_pretrained(
	"hoguai/RIO-2",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	processor = AutoProcessor.from_pretrained(
	"hoguai/RIO-2",
	trust_remote_code=True,
	)

	model.load_s2_base(device="cuda")
	model.refresh_s2(image, "pick up the red cube", force=True)
	actions = model.act_fast(state, steps=2)
	```

	Runtime Pattern

	RIO-2 is intended to run as a two-rate policy:

	1. Refresh S2 when the scene or instruction changes, or at a low fixed rate.
	2. Reuse cached S2 tokens inside the high-frequency control loop.
	3. Call act_fast() repeatedly with the latest robot state.
	4. Execute only the safe portion of the returned action chunk through an external safety controller.


	Safety

	RIO-2 outputs continuous robot actions and must not be connected directly to real hardware. Always place the policy
	behind a robot safety layer with joint limits, velocity/acceleration/jerk limits, workspace constraints, watchdog,
	E-stop, and a fallback controller.