Improve model card for MM-ACT: Add metadata, links, and setup instructions

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +74 -3
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: robotics
4
+ library_name: transformers
5
+ ---
6
+
7
+ # MM-ACT: Learn from Multimodal Parallel Generation to Act
8
+
9
+ [![arXiv](https://img.shields.io/badge/arXiv-Paper-red.svg)](https://arxiv.org/abs/2512.00975)
10
+ [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97-Model-yellow)](https://huggingface.co/hhyhrhy/MM-ACT-Model)
11
+ [![Hugging Face Datasets](https://img.shields.io/badge/%F0%9F%A4%97-Dataset-blue)](https://huggingface.co/datasets/hhyhrhy/MM-ACT-data)
12
+
13
+ <br>
14
+
15
+ <div align="center">
16
+ <img src="https://github.com/HHYHRHY/MM-ACT/raw/main/assets/MM-ACT.png" width="80%" alt="MM-ACT Arch"/>
17
+ </div>
18
+
19
+ <br>
20
+
21
+ This repository contains **MM-ACT**, a unified Vision-Language-Action (VLA) model that integrates text, image, and action in shared token space and performs generation across all three modalities. MM-ACT adopts a re-mask parallel decoding strategy for text and image generation, and employs a one-step parallel decoding strategy for action generation to improve efficiency.
22
+
23
+ The model was presented in the paper [MM-ACT: Learn from Multimodal Parallel Generation to Act](https://huggingface.co/papers/2512.00975).
24
+
25
+ Code: https://github.com/HHYHRHY/MM-ACT
26
+
27
+ ## Usage
28
+
29
+ For detailed usage, including training and deployment scripts, please refer to the official [GitHub repository](https://github.com/HHYHRHY/MM-ACT).
30
+
31
+ ### 1. Clone Repo and Environment Setup
32
+
33
+ ```bash
34
+ git clone https://github.com/HHYHRHY/MM-ACT.git
35
+ cd MM-ACT
36
+
37
+ # Create environment
38
+ conda create -n mmact python=3.13
39
+ conda activate mmact
40
+
41
+ # Install requirements
42
+ pip install -r requirement.txt
43
+ ```
44
+
45
+ ### 2. Dataset Preparation
46
+
47
+ - **LIBERO**
48
+
49
+ We utilize LIBERO datasets from [Huggingface_LeRobot](https://huggingface.co/lerobot), and uses LeRobot datasets for loading robot data.
50
+ Please download [LIBERO-Object](https://huggingface.co/datasets/lerobot/libero_object_image),
51
+ [LIBERO-Spatial](https://huggingface.co/datasets/lerobot/libero_spatial_image),[LIBERO-Goal](https://huggingface.co/datasets/lerobot/libero_goal_image) and
52
+ [LIBERO-10](https://huggingface.co/datasets/lerobot/libero_10_image). For LIBERO-10, we also provide our task planning datasets in [LIBERO-10-task](https://huggingface.co/datasets/hhyhrhy/MM-ACT-data/tree/main/LIBERO).
53
+
54
+ - **RoboTwin**
55
+
56
+ For RoboTwin datasets, we utilize a dataset sampling pipeline that includes task planning generation. You can download our [datasets](https://huggingface.co/datasets/hhyhrhy/MM-ACT-data/tree/main/RoboTwin)
57
+ or collect your own datasets with our pipeline in [Robotwin_subtask](https://github.com/RoboTwin-Platform/RoboTwin/tree/Subtask_info). This branch includes updates to original RoboTwin data collection pipeline to support our subtask text annotations. The collection usage is identical to the main branch. Please report any bugs or questions of text annotations in MM-ACT's issue.
58
+
59
+ ### 3. Model Weight Preparation
60
+
61
+ Download the base model weights from MMaDA: [MMaDA-8B-Base](https://huggingface.co/Gen-Verse/MMaDA-8B-Base) and expand the original model's action codebook (we use 2048):
62
+
63
+ ```bash
64
+ python model_utils/resize_model_vocab.py --model ${origin_model_path} --out ${output_model_path} --num_new ${action_codebook_size}
65
+ ```
66
+
67
+ ## 🎥 Real-world Experiments (Video Demo)
68
+
69
+ https://private-user-images.githubusercontent.com/91517920/520774696-02a3bf40-f1ae-4f52-9562-a3fc2e9a1477.mp4
70
+
71
+ ## Acknowledgments
72
+
73
+ This work is based on [MMaDA](https://github.com/Gen-Verse/MMaDA), [RoboTwin](https://github.com/robotwin-Platform/RoboTwin),
74
+ [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), [LeRobot](https://github.com/huggingface/lerobot), [OpenVLA](https://github.com/openvla/openvla.git). Thanks these great work.