Dream-org
/

Dream-VL-7B

@@ -1,14 +1,19 @@
 ---
 library_name: transformers
 tags:
 - vlm
 - image-text-to-text
 - multimodal
 - pretraining
-license: apache-2.0
-language:
-- en
-pipeline_tag: image-text-to-text
 ---
 # Dream-VL 7B
@@ -18,7 +23,7 @@ The model takes language instructions and images as input and generates language
 All Dream-VL checkpoints, as well as our [training codebase](https://github.com/DreamLM/Dream-VLX) are released under an Apache 2.0 License.
-For full details, please read [our blog](https://hkunlp.github.io/blog/2025/dream-vlx/) and paper (pending).
 ## Model Summary
@@ -27,7 +32,7 @@ For full details, please read [our blog](https://hkunlp.github.io/blog/2025/drea
 - **License:** apache-2.0
 - **Finetuned from:** [`Dream-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B), with Qwen2ViT Vision Backbone.
 - **Pretraining Dataset:** [MAmmoTH-VL-Instruct-12M](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M).
-- **Repository:** [https://github.com/DreamLM/DreamVLX](https://github.com/DreamLM/Dream-VLX)
 - **Project Page & Videos:** [https://hkunlp.github.io/blog/2025/dream-vlx](https://hkunlp.github.io/blog/2025/dream-vlx/)
 ## Getting Started
@@ -133,8 +138,8 @@ for j in range(len(messages)):
 ```bibtex
 @article{ye2025dreamvla,
   title={Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone},
-  author={Ye, Jiacheng and Gong, Shansan and Gao, Jiahui and Fan, Junming and Wu, Shuang and Bi, Wei and Bai, Haoli and Shang, Lifeng and Kong, Lingpeng},
-  journal={arXiv preprint},
   year={2025}
 }
 ```

 ---
+language:
+- en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: image-text-to-text
+base_model: Dream-org/Dream-v0-Instruct-7B
+datasets:
+- MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
+arxiv: 2512.22615
 tags:
 - vlm
 - image-text-to-text
 - multimodal
 - pretraining
+- diffusion
 ---
 # Dream-VL 7B
 All Dream-VL checkpoints, as well as our [training codebase](https://github.com/DreamLM/Dream-VLX) are released under an Apache 2.0 License.
+For full details, please read [our blog](https://hkunlp.github.io/blog/2025/dream-vlx/) and the paper: [Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone](https://huggingface.co/papers/2512.22615).
 ## Model Summary
 - **License:** apache-2.0
 - **Finetuned from:** [`Dream-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B), with Qwen2ViT Vision Backbone.
 - **Pretraining Dataset:** [MAmmoTH-VL-Instruct-12M](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M).
+- **Repository:** [https://github.com/DreamLM/Dream-VLX](https://github.com/DreamLM/Dream-VLX)
 - **Project Page & Videos:** [https://hkunlp.github.io/blog/2025/dream-vlx](https://hkunlp.github.io/blog/2025/dream-vlx/)
 ## Getting Started
 ```bibtex
 @article{ye2025dreamvla,
   title={Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone},
+  author={Ye, Jiacheng and Gong, Shansan and Gao, Jiahui and Fan, Junming and Wu, Shuang} and Bi, Wei and Bai, Haoli and Shang, Lifeng and Kong, Lingpeng},
+  journal={arXiv preprint arXiv:2512.22615},
   year={2025}
 }
 ```