Update README.md
Browse files
README.md
CHANGED
|
@@ -41,155 +41,6 @@ tags:
|
|
| 41 |
/>
|
| 42 |
</div>
|
| 43 |
|
| 44 |
-
## π₯ News
|
| 45 |
-
* Jan 29, 2026: π We released the training and inference code, along with the model weights for [**UnifoLM-VLA-0**](https://huggingface.co/collections/unitreerobotics/unifolm-wma-0-68ca23027310c0ca0f34959c).
|
| 46 |
-
|
| 47 |
-
## π Open-Source Plan
|
| 48 |
-
- [x] Training
|
| 49 |
-
- [x] Inference
|
| 50 |
-
- [x] Checkpoints
|
| 51 |
-
|
| 52 |
-
## βοΈ Installation
|
| 53 |
-
This project is built on **CUDA 12.4**, and using the same version is strongly recommended to ensure compatibility.
|
| 54 |
-
```
|
| 55 |
-
conda create -n unifolm-vla python==3.10.18
|
| 56 |
-
conda activate unifolm-vla
|
| 57 |
-
|
| 58 |
-
git clone https://github.com/unitreerobotics/unifolm-vla.git
|
| 59 |
-
|
| 60 |
-
# If you already downloaded the repo:
|
| 61 |
-
cd unifolm-vla
|
| 62 |
-
pip install --no-deps "lerobot @ git+https://github.com/huggingface/lerobot.git@0878c68"
|
| 63 |
-
pip install -e .
|
| 64 |
-
|
| 65 |
-
# Install FlashAttention2
|
| 66 |
-
pip install "flash-attn==2.5.6" --no-build-isolation
|
| 67 |
-
|
| 68 |
-
```
|
| 69 |
-
## π§° Model Checkpoints
|
| 70 |
-
| Model | Description | Link|
|
| 71 |
-
|---------|-------|------|
|
| 72 |
-
|`UnifoLM-VLM-Base` | Fine-tuned on general-purpose imageβtext VQA data and open-source robot datasets. | [HuggingFace](https://huggingface.co/unitreerobotics/Unifolm-VLM-Base)|
|
| 73 |
-
|`UnifoLM-VLA-Base` | Fine-tuned on [Unitree opensource](https://huggingface.co/collections/unitreerobotics/g1-dex1-datasets-68bae98bf0a26d617f9983ab) dataset. | [HuggingFace](https://huggingface.co/unitreerobotics/Unifolm-VLA-Base)|
|
| 74 |
-
|`UnifoLM-VLA-LIBERO`| Fine-tuned on [Libero](https://huggingface.co/collections/unitreerobotics/g1-dex1-datasets-68bae98bf0a26d617f9983ab) dataset. | [HuggingFace](https://huggingface.co/unitreerobotics/Unifolm-VLA-Libero)|
|
| 75 |
-
|
| 76 |
-
## π’οΈ Dataset
|
| 77 |
-
In our experiments, we consider the following twelve open-source dataset:
|
| 78 |
-
| Dataset | Robot | Link |
|
| 79 |
-
|---------|-------|------|
|
| 80 |
-
|G1_Stack_Block| [Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Stack_Block)|
|
| 81 |
-
|G1_Bag_Insert|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Bag_Insert)|
|
| 82 |
-
|G1_Erase_Board|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Erase_Board)|
|
| 83 |
-
|G1_Clean_Table|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Clean_Table)|
|
| 84 |
-
|G1_Pack_PencilBox|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Pack_PencilBox)|
|
| 85 |
-
|G1_Pour_Medicine|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Pour_Medicine)|
|
| 86 |
-
|G1_Pack_PingPong|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Pack_PingPong)|
|
| 87 |
-
|G1_Prepare_Fruit|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Prepare_Fruit)|
|
| 88 |
-
|G1_Organize_Tools|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Organize_Tools)|
|
| 89 |
-
|G1_Fold_Towel|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Fold_Towel)|
|
| 90 |
-
|G1_Wipe_Table|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_Wipe_Table)|
|
| 91 |
-
|G1_DualRobot_Clean_Table|[Unitree G1](https://www.unitree.com/g1)|[Huggingface](https://huggingface.co/datasets/unitreerobotics/G1_DualRobot_Clean_Table)|
|
| 92 |
-
|
| 93 |
-
To train on your own dataset, ensure the data follows the [Huggingface LeRobot V2.1](https://github.com/huggingface/lerobot) dataset format. Assume the source directory structure of the dataset is as follows:
|
| 94 |
-
```
|
| 95 |
-
source_dir/
|
| 96 |
-
βββ dataset1_name
|
| 97 |
-
βββ dataset2_name
|
| 98 |
-
βββ dataset3_name
|
| 99 |
-
βββ ...
|
| 100 |
-
```
|
| 101 |
-
Then, run the following command to convert the dataset from LeRobot format to HDF5 format:
|
| 102 |
-
```python
|
| 103 |
-
cd prepare_data
|
| 104 |
-
python convert_lerobot_to_hdf5.py \
|
| 105 |
-
--data_path /path/to/your/source_dir/dataset1_name \
|
| 106 |
-
--target_path /path/to/save/the/converted/data/directory
|
| 107 |
-
```
|
| 108 |
-
Finally, run the following command to convert the HDF5 format into the RLDS dataset format required for training. Be sure to update the path ([here](prepare_data/hdf5_to_rlds/rlds_dataset/rlds_dataset.py#L232)) to the correct location of the HDF5 data.
|
| 109 |
-
```
|
| 110 |
-
cd prepare_data/hdf5_to_rlds/rlds_dataset
|
| 111 |
-
tfds build --data_dir /path/to/save/the/converted/data/directory
|
| 112 |
-
```
|
| 113 |
-
The directory structure of the converted RLDS dataset is as follows:
|
| 114 |
-
```
|
| 115 |
-
source_dir/
|
| 116 |
-
βββ downloads
|
| 117 |
-
βββ rlds_dataset
|
| 118 |
-
βββ 1.0.0
|
| 119 |
-
```
|
| 120 |
-
The `1.0.0` directory is the final RLDS dataset version that can be used for training. The final directory should be kept as `source_dir/1.0.0`(e.g., `g1_stack_block/1.0.0`).
|
| 121 |
-
|
| 122 |
-
## π΄ββοΈ Training
|
| 123 |
-
To train on a single dataset or multiple datasets, follow the steps below:
|
| 124 |
-
- **Step 1**: Assuming you have already prepared the RLDS dataset, register the dataset (e.g., the Unitree open-source dataset `G1_StackBox`) with our dataloader by adding an entry for it in `configs.py` ([here](src/unifolm_vla/rlds_dataloader/datasets/rlds/oxe/configs.py#L58)), `transforms.py` ([here](src/unifolm_vla/rlds_dataloader/datasets/rlds/oxe/transforms.py#L948)), and `mixtures.py` ([here](src/unifolm_vla/rlds_dataloader/datasets/rlds/oxe/mixtures.py#L366)) and `datasets.py`([here](src/unifolm_vla/rlds_dataloader/datasets/datasets.py#L106)). For reference, in each of these files, there are sample entries for the G1 datasets that we used in experiments.
|
| 125 |
-
- **Step 2**: Before starting fine-tuning, configure the size of the action chunks predicted by the model, the action and state degrees of freedom in the dataset, and the data normalization scheme in `constants.py` ([here](src/unifolm_vla/rlds_dataloader/constants.py#L70)). Refer to `NUM_ACTIONS_CHUNK`, `ACTION_DIM`, `PROPRIO_DIM`, and `ACTION_PROPRIO_NORMALIZATION_TYPE` in `G1_CONSTANTS`.
|
| 126 |
-
- **Step 3**: please complete the configuration in the following order (see [here](scripts/run_scripts/run_unifolm_vla_train.sh)):
|
| 127 |
-
1. **Model Initialization**: Set `base_vlm` to the local path or the corresponding model weight URL of **UnifoLM-VLM-Base**, which will be used to initialize the visionβlanguage backbone model.
|
| 128 |
-
2. **Dataset Path Configuration**: After configuring the model path, set `oxe_data_root` to the root directory of the dataset to ensure that the training script can correctly load the RLDS data.
|
| 129 |
-
3. **Dataset Mixture Specification**: Based on the configured data root, set `data_mix` to the name of the dataset(s) to be used for training or to the desired dataset mixture.
|
| 130 |
-
4. **Model Checkpoint Saving**: Specify the paths for saving model checkpoints and logs, which will store the model weights and training states generated during fine-tuning for later recovery, evaluation, and inference.
|
| 131 |
-
5. **Parallelism Configuration**: Finally, adjust `num_processes` according to the number of available GPUs to match the scale of distributed training.
|
| 132 |
-
- **Step 4**: You can now start fine-tuning. Before running the script [`run_unifolm_vla_train.sh`](scripts/run_scripts/run_unifolm_vla_train.sh),
|
| 133 |
-
|
| 134 |
-
## π Simulation Inference Evaluation
|
| 135 |
-
To evaluate the **UnifoLM-VLA-Libero** model in the `LIBERO` simulation environment ([here](https://huggingface.co/datasets/openvla/modified_libero_rlds)), follow the steps below:
|
| 136 |
-
- **Step 1**: Install the LIBERO simulation environment and its dependencies::
|
| 137 |
-
```
|
| 138 |
-
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
|
| 139 |
-
pip install -e LIBERO
|
| 140 |
-
pip install -r experiments/LIBERO/libero_requirements.txt # Run from the UnifoLM-VLA project root directory
|
| 141 |
-
```
|
| 142 |
-
- **Step 2**: In `run_eval_libero.sh` ([here](scripts/eval_scripts/run_eval_libero.sh)), modify the following fields: ```your_ckpt```, ```task_suite_name```, ```unnorm_key```, and ```LIBERO_HOME``` and ```vlm_pretrained_path```.
|
| 143 |
-
- **Step 3**: Launch the evaluation:
|
| 144 |
-
```
|
| 145 |
-
conda activate unifolm-vla
|
| 146 |
-
cd unifolm-vla
|
| 147 |
-
bash scripts/eval_scripts/run_eval_libero.sh
|
| 148 |
-
```
|
| 149 |
-
|
| 150 |
-
## π€ Real-World Inference Evaluation
|
| 151 |
-
|
| 152 |
-
In our system, inference is executed on the server side. The robot client collects observations from the real robot and sends them to the server for action inference. The full pipeline can be completed by following the steps below.
|
| 153 |
-
|
| 154 |
-
### Server Setup
|
| 155 |
-
- **Step 1**: In `run_real_eval_server.sh` ([here](scripts/eval_scripts/run_real_eval_server.sh)), modify the following fields: ```ckpt_path```, ```port```, and ```unnorm_key``` and ```vlm_pretrained_path```.
|
| 156 |
-
- **Step 2**: Launch the server:
|
| 157 |
-
```
|
| 158 |
-
conda activate unifolm-vla
|
| 159 |
-
cd unifolm-vla
|
| 160 |
-
bash scripts/eval_scripts/run_real_eval_server.sh
|
| 161 |
-
```
|
| 162 |
-
|
| 163 |
-
### Client Setup
|
| 164 |
-
|
| 165 |
-
- **Step 1**: Refer to [unitree_deploy/README.md](https://github.com/unitreerobotics/unifolm-world-model-action/blob/main/unitree_deploy/README.md) to create the ```unitree_deploy``` conda environment, install the required dependencies, and start the controller or service on the real robot.
|
| 166 |
-
|
| 167 |
-
- **Step 2**: Open a new terminal and establish a tunnel connection from the client to the server:
|
| 168 |
-
```
|
| 169 |
-
ssh user_name@remote_server_IP -CNg -L port:127.0.0.1:port
|
| 170 |
-
```
|
| 171 |
-
- **Step 3**: Modify and run the script ```unitree_deploy/robot_client.py``` as a reference.
|
| 172 |
-
|
| 173 |
-
## π Codebase Architecture
|
| 174 |
-
Here's a high-level overview of the project's code structure and core components:
|
| 175 |
-
```
|
| 176 |
-
unifolm-vla/
|
| 177 |
-
βββ assets # Media assets such as GIFs
|
| 178 |
-
βββ experiments # Libero datasets for running inference
|
| 179 |
-
βββ deployment # Deployment server code
|
| 180 |
-
βββ prepare_data # Scripts for dataset preprocessing and format conversion
|
| 181 |
-
βββ scripts # Main scripts for training, evaluation, and deployment
|
| 182 |
-
βββ src
|
| 183 |
-
β βββunifolm_vla # Core Python package for the Unitree world model
|
| 184 |
-
β β βββ config # Configuration files for training
|
| 185 |
-
β β βββ model # Model architectures and backbone definitions
|
| 186 |
-
β β βββ rlds_dataloader # Dataset loading, transformations, and dataloaders
|
| 187 |
-
β β βββ training # Model Training
|
| 188 |
-
```
|
| 189 |
-
|
| 190 |
-
## π Acknowledgement
|
| 191 |
-
Lots of code are inherited from [Qwen2.5-VL](https://arxiv.org/abs/2502.13923), [Isaac-GR00T](https://github.com/NVIDIA/Isaac-GR00T), [Open-X](https://robotics-transformer-x.github.io/), [openvla-oft](https://github.com/moojink/openvla-oft), [InternVLA-M1](https://github.com/InternRobotics/InternVLA-M1).
|
| 192 |
-
|
| 193 |
## π Citation
|
| 194 |
```
|
| 195 |
@misc{unifolm-vla-0,
|
|
|
|
| 41 |
/>
|
| 42 |
</div>
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
## π Citation
|
| 45 |
```
|
| 46 |
@misc{unifolm-vla-0,
|