hjx1995 commited on
Commit
da12103
Β·
verified Β·
1 Parent(s): eccb199

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -149
README.md CHANGED
@@ -41,155 +41,6 @@ tags:
41
  />
42
  </div>
43
 
44
- ## πŸ”₯ News
45
- * Jan 29, 2026: πŸš€ We released the training and inference code, along with the model weights for [**UnifoLM-VLA-0**](https://huggingface.co/collections/unitreerobotics/unifolm-wma-0-68ca23027310c0ca0f34959c).
46
-
47
- ## πŸ“‘ Open-Source Plan
48
- - [x] Training
49
- - [x] Inference
50
- - [x] Checkpoints
51
-
52
- ## βš™οΈ Installation
53
- This project is built on **CUDA 12.4**, and using the same version is strongly recommended to ensure compatibility.
54
- ```
55
- conda create -n unifolm-vla python==3.10.18
56
- conda activate unifolm-vla
57
-
58
- git clone https://github.com/unitreerobotics/unifolm-vla.git
59
-
60
- # If you already downloaded the repo:
61
- cd unifolm-vla
62
- pip install --no-deps "lerobot @ git+https://github.com/huggingface/lerobot.git@0878c68"
63
- pip install -e .
64
-
65
- # Install FlashAttention2
66
- pip install "flash-attn==2.5.6" --no-build-isolation
67
-
68
- ```
69
- ## 🧰 Model Checkpoints
70
- | Model | Description | Link|
71
- |---------|-------|------|
72
- |`UnifoLM-VLM-Base` | Fine-tuned on general-purpose image–text VQA data and open-source robot datasets. | [HuggingFace](https://huggingface.co/unitreerobotics/Unifolm-VLM-Base)|
73
- |`UnifoLM-VLA-Base` | Fine-tuned on [Unitree opensource](https://huggingface.co/collections/unitreerobotics/g1-dex1-datasets-68bae98bf0a26d617f9983ab) dataset. | [HuggingFace](https://huggingface.co/unitreerobotics/Unifolm-VLA-Base)|
74
- |`UnifoLM-VLA-LIBERO`| Fine-tuned on [Libero](https://huggingface.co/collections/unitreerobotics/g1-dex1-datasets-68bae98bf0a26d617f9983ab) dataset. | [HuggingFace](https://huggingface.co/unitreerobotics/Unifolm-VLA-Libero)|
75
-
76
- ## πŸ›’οΈ Dataset
77
- In our experiments, we consider the following twelve open-source dataset:
78
- | Dataset | Robot | Link |
79
- |---------|-------|------|
80
- |G1_Stack_Block| [Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Stack_Block)|
81
- |G1_Bag_Insert|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Bag_Insert)|
82
- |G1_Erase_Board|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Erase_Board)|
83
- |G1_Clean_Table|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Clean_Table)|
84
- |G1_Pack_PencilBox|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Pack_PencilBox)|
85
- |G1_Pour_Medicine|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Pour_Medicine)|
86
- |G1_Pack_PingPong|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Pack_PingPong)|
87
- |G1_Prepare_Fruit|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Prepare_Fruit)|
88
- |G1_Organize_Tools|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Organize_Tools)|
89
- |G1_Fold_Towel|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Fold_Towel)|
90
- |G1_Wipe_Table|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Wipe_Table)|
91
- |G1_DualRobot_Clean_Table|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_DualRobot_Clean_Table)|
92
-
93
- To train on your own dataset, ensure the data follows the [Huggingface LeRobot V2.1](https://github.com/huggingface/lerobot) dataset format. Assume the source directory structure of the dataset is as follows:
94
- ```
95
- source_dir/
96
- β”œβ”€β”€ dataset1_name
97
- β”œβ”€β”€ dataset2_name
98
- β”œβ”€β”€ dataset3_name
99
- └── ...
100
- ```
101
- Then, run the following command to convert the dataset from LeRobot format to HDF5 format:
102
- ```python
103
- cd prepare_data
104
- python convert_lerobot_to_hdf5.py \
105
- --data_path /path/to/your/source_dir/dataset1_name \
106
- --target_path /path/to/save/the/converted/data/directory
107
- ```
108
- Finally, run the following command to convert the HDF5 format into the RLDS dataset format required for training. Be sure to update the path ([here](prepare_data/hdf5_to_rlds/rlds_dataset/rlds_dataset.py#L232)) to the correct location of the HDF5 data.
109
- ```
110
- cd prepare_data/hdf5_to_rlds/rlds_dataset
111
- tfds build --data_dir /path/to/save/the/converted/data/directory
112
- ```
113
- The directory structure of the converted RLDS dataset is as follows:
114
- ```
115
- source_dir/
116
- β”œβ”€β”€ downloads
117
- β”œβ”€β”€ rlds_dataset
118
- └── 1.0.0
119
- ```
120
- The `1.0.0` directory is the final RLDS dataset version that can be used for training. The final directory should be kept as `source_dir/1.0.0`(e.g., `g1_stack_block/1.0.0`).
121
-
122
- ## πŸš΄β€β™‚οΈ Training
123
- To train on a single dataset or multiple datasets, follow the steps below:
124
- - **Step 1**: Assuming you have already prepared the RLDS dataset, register the dataset (e.g., the Unitree open-source dataset `G1_StackBox`) with our dataloader by adding an entry for it in `configs.py` ([here](src/unifolm_vla/rlds_dataloader/datasets/rlds/oxe/configs.py#L58)), `transforms.py` ([here](src/unifolm_vla/rlds_dataloader/datasets/rlds/oxe/transforms.py#L948)), and `mixtures.py` ([here](src/unifolm_vla/rlds_dataloader/datasets/rlds/oxe/mixtures.py#L366)) and `datasets.py`([here](src/unifolm_vla/rlds_dataloader/datasets/datasets.py#L106)). For reference, in each of these files, there are sample entries for the G1 datasets that we used in experiments.
125
- - **Step 2**: Before starting fine-tuning, configure the size of the action chunks predicted by the model, the action and state degrees of freedom in the dataset, and the data normalization scheme in `constants.py` ([here](src/unifolm_vla/rlds_dataloader/constants.py#L70)). Refer to `NUM_ACTIONS_CHUNK`, `ACTION_DIM`, `PROPRIO_DIM`, and `ACTION_PROPRIO_NORMALIZATION_TYPE` in `G1_CONSTANTS`.
126
- - **Step 3**: please complete the configuration in the following order (see [here](scripts/run_scripts/run_unifolm_vla_train.sh)):
127
- 1. **Model Initialization**: Set `base_vlm` to the local path or the corresponding model weight URL of **UnifoLM-VLM-Base**, which will be used to initialize the vision–language backbone model.
128
- 2. **Dataset Path Configuration**: After configuring the model path, set `oxe_data_root` to the root directory of the dataset to ensure that the training script can correctly load the RLDS data.
129
- 3. **Dataset Mixture Specification**: Based on the configured data root, set `data_mix` to the name of the dataset(s) to be used for training or to the desired dataset mixture.
130
- 4. **Model Checkpoint Saving**: Specify the paths for saving model checkpoints and logs, which will store the model weights and training states generated during fine-tuning for later recovery, evaluation, and inference.
131
- 5. **Parallelism Configuration**: Finally, adjust `num_processes` according to the number of available GPUs to match the scale of distributed training.
132
- - **Step 4**: You can now start fine-tuning. Before running the script [`run_unifolm_vla_train.sh`](scripts/run_scripts/run_unifolm_vla_train.sh),
133
-
134
- ## 🌏 Simulation Inference Evaluation
135
- To evaluate the **UnifoLM-VLA-Libero** model in the `LIBERO` simulation environment ([here](https://huggingface.co/datasets/openvla/modified_libero_rlds)), follow the steps below:
136
- - **Step 1**: Install the LIBERO simulation environment and its dependencies::
137
- ```
138
- git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
139
- pip install -e LIBERO
140
- pip install -r experiments/LIBERO/libero_requirements.txt # Run from the UnifoLM-VLA project root directory
141
- ```
142
- - **Step 2**: In `run_eval_libero.sh` ([here](scripts/eval_scripts/run_eval_libero.sh)), modify the following fields: ```your_ckpt```, ```task_suite_name```, ```unnorm_key```, and ```LIBERO_HOME``` and ```vlm_pretrained_path```.
143
- - **Step 3**: Launch the evaluation:
144
- ```
145
- conda activate unifolm-vla
146
- cd unifolm-vla
147
- bash scripts/eval_scripts/run_eval_libero.sh
148
- ```
149
-
150
- ## πŸ€– Real-World Inference Evaluation
151
-
152
- In our system, inference is executed on the server side. The robot client collects observations from the real robot and sends them to the server for action inference. The full pipeline can be completed by following the steps below.
153
-
154
- ### Server Setup
155
- - **Step 1**: In `run_real_eval_server.sh` ([here](scripts/eval_scripts/run_real_eval_server.sh)), modify the following fields: ```ckpt_path```, ```port```, and ```unnorm_key``` and ```vlm_pretrained_path```.
156
- - **Step 2**: Launch the server:
157
- ```
158
- conda activate unifolm-vla
159
- cd unifolm-vla
160
- bash scripts/eval_scripts/run_real_eval_server.sh
161
- ```
162
-
163
- ### Client Setup
164
-
165
- - **Step 1**: Refer to [unitree_deploy/README.md](https://github.com/unitreerobotics/unifolm-world-model-action/blob/main/unitree_deploy/README.md) to create the ```unitree_deploy``` conda environment, install the required dependencies, and start the controller or service on the real robot.
166
-
167
- - **Step 2**: Open a new terminal and establish a tunnel connection from the client to the server:
168
- ```
169
- ssh user_name@remote_server_IP -CNg -L port:127.0.0.1:port
170
- ```
171
- - **Step 3**: Modify and run the script ```unitree_deploy/robot_client.py``` as a reference.
172
-
173
- ## πŸ“ Codebase Architecture
174
- Here's a high-level overview of the project's code structure and core components:
175
- ```
176
- unifolm-vla/
177
- β”œβ”€β”€ assets # Media assets such as GIFs
178
- β”œβ”€β”€ experiments # Libero datasets for running inference
179
- β”œβ”€β”€ deployment # Deployment server code
180
- β”œβ”€β”€ prepare_data # Scripts for dataset preprocessing and format conversion
181
- β”œβ”€β”€ scripts # Main scripts for training, evaluation, and deployment
182
- β”œβ”€β”€ src
183
- β”‚ β”œβ”€β”€unifolm_vla # Core Python package for the Unitree world model
184
- β”‚ β”‚ β”œβ”€β”€ config # Configuration files for training
185
- β”‚ β”‚ β”œβ”€β”€ model # Model architectures and backbone definitions
186
- β”‚ β”‚ β”œβ”€β”€ rlds_dataloader # Dataset loading, transformations, and dataloaders
187
- β”‚ β”‚ └── training # Model Training
188
- ```
189
-
190
- ## πŸ™ Acknowledgement
191
- Lots of code are inherited from [Qwen2.5-VL](https://arxiv.org/abs/2502.13923), [Isaac-GR00T](https://github.com/NVIDIA/Isaac-GR00T), [Open-X](https://robotics-transformer-x.github.io/), [openvla-oft](https://github.com/moojink/openvla-oft), [InternVLA-M1](https://github.com/InternRobotics/InternVLA-M1).
192
-
193
  ## πŸ“ Citation
194
  ```
195
  @misc{unifolm-vla-0,
 
41
  />
42
  </div>
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ## πŸ“ Citation
45
  ```
46
  @misc{unifolm-vla-0,