Link model to paper and update metadata
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,14 +1,19 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- vlm
|
| 5 |
- image-text-to-text
|
| 6 |
- multimodal
|
| 7 |
- pretraining
|
| 8 |
-
|
| 9 |
-
language:
|
| 10 |
-
- en
|
| 11 |
-
pipeline_tag: image-text-to-text
|
| 12 |
---
|
| 13 |
|
| 14 |
# Dream-VL 7B
|
|
@@ -18,7 +23,7 @@ The model takes language instructions and images as input and generates language
|
|
| 18 |
|
| 19 |
All Dream-VL checkpoints, as well as our [training codebase](https://github.com/DreamLM/Dream-VLX) are released under an Apache 2.0 License.
|
| 20 |
|
| 21 |
-
For full details, please read [our blog](https://hkunlp.github.io/blog/2025/dream-vlx/) and paper (
|
| 22 |
|
| 23 |
## Model Summary
|
| 24 |
|
|
@@ -27,7 +32,7 @@ For full details, please read [our blog](https://hkunlp.github.io/blog/2025/drea
|
|
| 27 |
- **License:** apache-2.0
|
| 28 |
- **Finetuned from:** [`Dream-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B), with Qwen2ViT Vision Backbone.
|
| 29 |
- **Pretraining Dataset:** [MAmmoTH-VL-Instruct-12M](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M).
|
| 30 |
-
- **Repository:** [https://github.com/DreamLM/
|
| 31 |
- **Project Page & Videos:** [https://hkunlp.github.io/blog/2025/dream-vlx](https://hkunlp.github.io/blog/2025/dream-vlx/)
|
| 32 |
|
| 33 |
## Getting Started
|
|
@@ -133,8 +138,8 @@ for j in range(len(messages)):
|
|
| 133 |
```bibtex
|
| 134 |
@article{ye2025dreamvla,
|
| 135 |
title={Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone},
|
| 136 |
-
author={Ye, Jiacheng and Gong, Shansan and Gao, Jiahui and Fan, Junming and Wu, Shuang and Bi, Wei and Bai, Haoli and Shang, Lifeng and Kong, Lingpeng},
|
| 137 |
-
journal={arXiv preprint},
|
| 138 |
year={2025}
|
| 139 |
}
|
| 140 |
```
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
library_name: transformers
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
pipeline_tag: image-text-to-text
|
| 7 |
+
base_model: Dream-org/Dream-v0-Instruct-7B
|
| 8 |
+
datasets:
|
| 9 |
+
- MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
|
| 10 |
+
arxiv: 2512.22615
|
| 11 |
tags:
|
| 12 |
- vlm
|
| 13 |
- image-text-to-text
|
| 14 |
- multimodal
|
| 15 |
- pretraining
|
| 16 |
+
- diffusion
|
|
|
|
|
|
|
|
|
|
| 17 |
---
|
| 18 |
|
| 19 |
# Dream-VL 7B
|
|
|
|
| 23 |
|
| 24 |
All Dream-VL checkpoints, as well as our [training codebase](https://github.com/DreamLM/Dream-VLX) are released under an Apache 2.0 License.
|
| 25 |
|
| 26 |
+
For full details, please read [our blog](https://hkunlp.github.io/blog/2025/dream-vlx/) and the paper: [Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone](https://huggingface.co/papers/2512.22615).
|
| 27 |
|
| 28 |
## Model Summary
|
| 29 |
|
|
|
|
| 32 |
- **License:** apache-2.0
|
| 33 |
- **Finetuned from:** [`Dream-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B), with Qwen2ViT Vision Backbone.
|
| 34 |
- **Pretraining Dataset:** [MAmmoTH-VL-Instruct-12M](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M).
|
| 35 |
+
- **Repository:** [https://github.com/DreamLM/Dream-VLX](https://github.com/DreamLM/Dream-VLX)
|
| 36 |
- **Project Page & Videos:** [https://hkunlp.github.io/blog/2025/dream-vlx](https://hkunlp.github.io/blog/2025/dream-vlx/)
|
| 37 |
|
| 38 |
## Getting Started
|
|
|
|
| 138 |
```bibtex
|
| 139 |
@article{ye2025dreamvla,
|
| 140 |
title={Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone},
|
| 141 |
+
author={Ye, Jiacheng and Gong, Shansan and Gao, Jiahui and Fan, Junming and Wu, Shuang} and Bi, Wei and Bai, Haoli and Shang, Lifeng and Kong, Lingpeng},
|
| 142 |
+
journal={arXiv preprint arXiv:2512.22615},
|
| 143 |
year={2025}
|
| 144 |
}
|
| 145 |
```
|