Improve model card with pipeline tag and library name

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +33 -8
README.md CHANGED
@@ -1,17 +1,23 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
4
  ![Image](assets/logo.jpeg)
5
 
6
  <div align="center">
7
 
8
  # HaploVL - A Single-Transformer Baseline for Multi-Modal Understanding
9
 
 
10
  [![Project page](https://img.shields.io/badge/Project_page-green)](https://haplo-vl.github.io/)&nbsp;
 
 
11
 
12
  </div>
13
 
14
- HaploVL is a multimodal understanding foundation model that delivers comprehensive cross-modal understanding capabilities for text, images, and video inputs through a single transformer architecture.
15
 
16
  ## Highlights
17
  This repository contains the PyTorch implementation, model weights, and training code for **Haplo**.
@@ -42,9 +48,9 @@ Basic usage example:
42
  ```python
43
  from haplo import HaploProcessor, HaploForConditionalGeneration
44
 
45
- processor = HaploProcessor.from_pretrained('stevengrove/Haplo-7B-Pro-Video')
46
  model = HaploForConditionalGeneration.from_pretrained(
47
- 'stevengrove/Haplo-7B-Pro-Video',
48
  torch_dtype=torch.bfloat16
49
  ).to('cuda')
50
 
@@ -65,13 +71,32 @@ outputs = model.generate(inputs)
65
  print(processor.decode(outputs[0]))
66
  ```
67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ## Acknowledgement
69
 
70
  ```bibtex
71
- @article{yang2024haplo,
72
- title={HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding},
73
- author={Yang, Rui and Song, Lin and Xiao, Yicheng and Huang, Runhui and Ge, Yixiao and Shan, Ying and Zhao, Hengshuang},
74
- journal={arXiv preprint arXiv:xxxx.xxxxx},
75
- year={2025}
76
  }
77
  ```
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: any-to-any
4
+ library_name: transformers
5
  ---
6
+
7
  ![Image](assets/logo.jpeg)
8
 
9
  <div align="center">
10
 
11
  # HaploVL - A Single-Transformer Baseline for Multi-Modal Understanding
12
 
13
+ [![arXiv paper](https://img.shields.io/badge/arXiv_paper-red)](http://arxiv.org/abs/2503.14694)&nbsp;
14
  [![Project page](https://img.shields.io/badge/Project_page-green)](https://haplo-vl.github.io/)&nbsp;
15
+ [![Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/collections/stevengrove/haplo-67d2582ac79d96983fa99697)&nbsp;
16
+ ![Tencent ARC Lab](https://img.shields.io/badge/Developed_by-Tencent_ARC_Lab-blue)&nbsp;
17
 
18
  </div>
19
 
20
+ HaploVL is a multimodal understanding foundation model that delivers comprehensive cross-modal understanding capabilities for text, images, and video inputs through a single transformer architecture. The model was presented in the paper [HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation](https://huggingface.co/papers/2506.02975).
21
 
22
  ## Highlights
23
  This repository contains the PyTorch implementation, model weights, and training code for **Haplo**.
 
48
  ```python
49
  from haplo import HaploProcessor, HaploForConditionalGeneration
50
 
51
+ processor = HaploProcessor.from_pretrained('stevengrove/Haplo-7B-Pro')
52
  model = HaploForConditionalGeneration.from_pretrained(
53
+ 'stevengrove/Haplo-7B-Pro',
54
  torch_dtype=torch.bfloat16
55
  ).to('cuda')
56
 
 
71
  print(processor.decode(outputs[0]))
72
  ```
73
 
74
+ ### Gradio Demo
75
+ Launch an interactive demo:
76
+ ```bash
77
+ python demo/demo.py \
78
+ -m "stevengrove/Haplo-7B-Pro-Video" \
79
+ --server-port 8080 \
80
+ --device cuda \
81
+ --dtype bfloat16
82
+ ```
83
+
84
+ **Multi-Modal Capabilities**
85
+
86
+ | Category | Example |
87
+ |-------------------------------|------------------------------------------|
88
+ | Single Image Understanding | ![Demo1](assets/demo_1.png) |
89
+ | Multi-Image Understanding | ![Demo3](assets/demo_2.png) |
90
+ | Video Understanding | ![Demo2](assets/demo_3.png) |
91
+
92
+
93
  ## Acknowledgement
94
 
95
  ```bibtex
96
+ @article{HaploVL,
97
+ title={HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding},
98
+ author={Yang, Rui and Song, Lin and Xiao, Yicheng and Huang, Runhui and Ge, Yixiao and Shan, Ying and Zhao, Hengshuang},
99
+ journal={arXiv preprint arXiv:2503.14694},
100
+ year={2025}
101
  }
102
  ```