Initialize model card (#5)
Browse files- Initialize model card (54fdb9152498fc3fcaa9079390624c4386e64aeb)
Co-authored-by: Yangfan(Charles) Gao <CharlesGao@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,3 +1,72 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: llama3.1
|
| 3 |
+
datasets:
|
| 4 |
+
- OpenMOSS-Team/FRoM-W1-Datasets
|
| 5 |
+
base_model:
|
| 6 |
+
- meta-llama/Llama-3.1-8B
|
| 7 |
+
tags:
|
| 8 |
+
- whole-body-control
|
| 9 |
+
- humanoid-robots
|
| 10 |
+
- motion-generation
|
| 11 |
+
- foundational-models
|
| 12 |
+
- text-to-motion
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
<div align="center">
|
| 16 |
+
<h1>FRoM-W1 (机智-W1): Towards General Humanoid Whole-Body Control with Language Instructions/h1>
|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
<div align="center">
|
| 20 |
+
The Humanoid Intelligence (Hi) Team at FudanNLP and OpenMOSS Group
|
| 21 |
+
</div>
|
| 22 |
+
|
| 23 |
+
<div align="center">
|
| 24 |
+
<a href="https://github.com/OpenMOSS/FRoM-W1">💻Github</a>
|
| 25 |
+
</div>
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
## Introduction
|
| 29 |
+
<div align="center">
|
| 30 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6208b57eace0f815845c6dbf/cASCQ7yqKP3LJMNFuBZAM.png" alt="FRoM-W1" width="50%">
|
| 31 |
+
</div>
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
Humanoid robots are capable of performing various actions such as greeting, dancing, and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility.
|
| 35 |
+
In this work, we present **FRoM-W1**, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.
|
| 36 |
+
To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, **FRoM-W1** operates in two stages:
|
| 37 |
+
(a) **H-GPT**: utilizing massive human data, a large-scale language-driven human whole-body motion generation model is trained to generate diverse natural behaviors.
|
| 38 |
+
We further leverage the Chain-of-Thought technique to improve the model’s generalization in instruction understanding.
|
| 39 |
+
(b) **H-ACT**: After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions.
|
| 40 |
+
It is then deployed on real robots via a modular simulation-to-reality module.
|
| 41 |
+
We extensively evaluate our framework on the Unitree H1 and G1 robots, demonstrating successful language-to-motion generation and stable execution in both simulation and real-world settings.
|
| 42 |
+
We fully open-source the entire **FRoM-W1** framework and hope it will advance the development of humanoid intelligence.
|
| 43 |
+
|
| 44 |
+
## Usage
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
<div align="center">
|
| 49 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6208b57eace0f815845c6dbf/PDWD8sgkNFCi0movdMkOU.png" alt="overview" width="80%">
|
| 50 |
+
</div>
|
| 51 |
+
|
| 52 |
+
The complete **FRoM-W1** workflow is illustrated above:
|
| 53 |
+
|
| 54 |
+
- **H-GPT**
|
| 55 |
+
Deploy **H-GPT** via command-line tools or a web interface to convert natural-language commands into human motion representations.
|
| 56 |
+
This module provides full training, inference, and evaluation code, and pretrained models are available on HuggingFace.
|
| 57 |
+
|
| 58 |
+
- **H-ACT**
|
| 59 |
+
**H-ACT** converts the motion representations from H-GPT into SMPL-X motion sequences and further retargets them to various humanoid robots.
|
| 60 |
+
The resulting motions can be used both for training control policies and executing actions on real robots using our deployment pipeline.
|
| 61 |
+
|
| 62 |
+
## Citation
|
| 63 |
+
If you find our work useful, please cite it for now in the following way:
|
| 64 |
+
```bibtex
|
| 65 |
+
@misc{FRoM-W1,
|
| 66 |
+
author = {Peng Li, Zihan Zhuang, Yangfan Gao, Yi Dong, Sixian Li, Changhao Jiang, Shihan Dou, Zhiheng Xi, Enyu Zhou, Jixuan Huang, Hui Li, Jingjing Gong, Xingjun Ma, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Xipeng Qiu},
|
| 67 |
+
title = {FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions},
|
| 68 |
+
url = {https://github.com/OpenMOSS/FRoM-W1},
|
| 69 |
+
year = {2025}
|
| 70 |
+
}
|
| 71 |
+
```
|
| 72 |
+
Welcome to star⭐ our GitHub Repo, raise issues, and submit PRs!
|