|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: any-to-any |
|
|
library_name: transformers |
|
|
tags: |
|
|
- human-centric |
|
|
- multimodal |
|
|
- 2d-vision |
|
|
- 3d-vision |
|
|
- skeleton-based |
|
|
- vision-language |
|
|
- pose-estimation |
|
|
- object-detection |
|
|
- image-segmentation |
|
|
- action-recognition |
|
|
- image-captioning |
|
|
- attribute-recognition |
|
|
--- |
|
|
|
|
|
# Hulk: A Universal Knowledge Translator for Human-Centric Tasks |
|
|
|
|
|
This model was presented in the paper [Hulk: A Universal Knowledge Translator for Human-Centric Tasks](https://huggingface.co/papers/2312.01697). |
|
|
|
|
|
* **Project Page**: [https://humancentricmodels.github.io/Hulk/](https://humancentricmodels.github.io/Hulk/) |
|
|
* **GitHub Repository**: [https://github.com/OpenGVLab/Hulk](https://github.com/OpenGVLab/Hulk) |
|
|
* **ArXiv Paper**: [https://arxiv.org/abs/2312.01697](https://arxiv.org/abs/2312.01697) |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/OpenGVLab/Hulk/resolve/main/assets/teaser.png" width="1000" /> |
|
|
</p> |
|
|
|
|
|
## Abstract |
|
|
Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis. There is a recent surge to develop human-centric foundation models that can benefit a broad range of human-centric perception tasks. While many human-centric foundation models have achieved success, they did not explore 3D and vision-language tasks for human-centric and required task-specific finetuning. These limitations restrict their application to more downstream tasks and situations. To tackle these problems, we present Hulk, the first multimodal human-centric generalist model, capable of addressing 2D vision, 3D vision, skeleton-based, and vision-language tasks without task-specific finetuning. The key to achieving this is condensing various task-specific heads into two general heads, one for discrete representations, \emph{e.g.,} languages, and the other for continuous representations, \emph{e.g.,} location coordinates. The outputs of two heads can be further stacked into four distinct input and output modalities. This uniform representation enables Hulk to treat diverse human-centric tasks as modality translation, integrating knowledge across a wide range of tasks. Comprehensive evaluations of Hulk on 12 benchmarks covering 8 human-centric tasks demonstrate the superiority of our proposed method, achieving state-of-the-art performance in 11 benchmarks. |
|
|
|
|
|
## Model Framework |
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/OpenGVLab/Hulk/resolve/main/assets/framework.png" width="1000" /> |
|
|
</p> |
|
|
|
|
|
## Usage |
|
|
For detailed installation instructions, dataset preparation, training procedures, evaluation scripts, and comprehensive inference examples across various human-centric tasks, please refer to the official [Hulk GitHub repository](https://github.com/OpenGVLab/Hulk). |
|
|
|
|
|
The codebase is built on top of the 🤗 [Diffusers](https://github.com/huggingface/diffusers) and 🤗 [Transformers](https://github.com/huggingface/transformers) libraries, and users should consult the repository for specific usage patterns. |
|
|
|
|
|
## Model Performance |
|
|
Hulk has achieved state-of-the-art results on various human-centric benchmarks, demonstrating its superiority in both direct evaluation and fine-tuning scenarios. For detailed performance metrics across different tasks and datasets, please consult the tables in the [GitHub README](https://github.com/OpenGVLab/Hulk#model-performance) and the [original paper](https://huggingface.co/papers/2312.01697). |
|
|
|
|
|
## Citation |
|
|
If you find this work useful, please consider citing: |
|
|
```bibtex |
|
|
@article{wang2023hulk, |
|
|
title={Hulk: A Universal Knowledge Translator for Human-Centric Tasks}, |
|
|
author={Wang, Yizhou and Wu, Yixuan and Tang, Shixiang and He, Weizhen and Guo, Xun and Zhu, Feng and Bai, Lei and Zhao, Rui and Wu, Jian and He, Tong and others}, |
|
|
journal={arXiv preprint arXiv:2312.01697}, |
|
|
year={2023} |
|
|
} |
|
|
``` |