Improve model card for Bifrost-1

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +77 -6
README.md CHANGED
@@ -1,12 +1,83 @@
1
- This repo contains the pretrained checkpoints for Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
 
 
 
 
2
 
3
- Bifrost-1 is designed for
4
- - High-Fidelity Generation: Patch-level CLIP latents natively aligned with MLLM visual encoder
5
- - Training Efficiency: Better image generation quality over other architecture variants with non-MLLM-aligned visual features, under controlled experimental settings
6
- - Preserves Visual Reasoning: Bifrost-1 fully inherits strong visual understanding capabilities of backbone MLLM
 
 
 
 
 
 
 
 
7
 
8
  <br>
9
  <img width="800" src="teaser.png"/>
10
  <br>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- See also: https://bifrost-1.github.io/
 
1
+ ---
2
+ pipeline_tag: text-to-image
3
+ library_name: transformers
4
+ license: apache-2.0
5
+ ---
6
 
7
+ # Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
8
+
9
+ This repository contains the pretrained checkpoints for **Bifrost-1**, a unified framework that bridges pretrained multimodal LLMs (MLLMs) and diffusion models using patch-level CLIP image embeddings as latent variables. Bifrost-1 enables high-fidelity controllable image generation with significant training efficiency without compromising the strong reasoning capabilities of MLLMs.
10
+
11
+ **Paper**: [Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents](https://huggingface.co/papers/2508.05954)
12
+ **Project Page**: [https://bifrost-1.github.io](https://bifrost-1.github.io)
13
+ **GitHub Repository**: [https://github.com/hanlincs/Bifrost-1](https://github.com/hanlincs/Bifrost-1)
14
+
15
+ Bifrost-1 is designed for:
16
+ - **High-Fidelity Generation**: Patch-level CLIP latents are natively aligned with the MLLM visual encoder, enabling high-quality image generation.
17
+ - **Training Efficiency**: Achieves better image generation quality over other architecture variants with non-MLLM-aligned visual features, under controlled experimental settings, with substantially lower compute during training.
18
+ - **Preserves Visual Reasoning**: Bifrost-1 fully inherits strong visual understanding capabilities of backbone MLLM by equipping it with a visual generation branch initialized from the original MLLM parameters.
19
 
20
  <br>
21
  <img width="800" src="teaser.png"/>
22
  <br>
23
+ <img width="800" src="https://github.com/hanlincs/Bifrost-1/raw/main/assets/bifrost_model_architecture.png"/>
24
+ <br>
25
+
26
+ ## ๐Ÿ”ง Environment Setup
27
+
28
+ ```shell
29
+ conda create -n bifrost1 python==3.11
30
+ conda activate bifrost1
31
+ pip install -r requirements.txt
32
+ ```
33
+
34
+ ## ๐Ÿ”ฎ Inference
35
+
36
+ ### ๐Ÿ“Œ Model Checkpoints
37
+
38
+ The model checkpoint can be downloaded from HuggingFace [here](https://huggingface.co/hanlincs/Bifrost-1).
39
+
40
+ You can download it to your specified `local_dir` with code:
41
+ ```python
42
+ from huggingface_hub import snapshot_download
43
+
44
+ snapshot_download(
45
+ repo_id="hanlincs/Bifrost-1",
46
+ repo_type="model",
47
+ local_dir="xxxxxxxx", # Replace with your local directory path
48
+ local_dir_use_symlinks=False
49
+ )
50
+ ```
51
+
52
+ ### ๐Ÿ“Œ Run Inference Scripts
53
+
54
+ Generate images from GenEval prompts
55
+
56
+ ```bash
57
+ python inference_geneval_dpgbench.py --eval_geneval --output_dir "./outputs" --local_checkpoint_path XXXXX # Replace XXXXX with your local checkpoint path
58
+ ```
59
+
60
+ ## ๐Ÿ“š BibTeX
61
+
62
+ ๐ŸŒŸ Please let us know in the issues or PRs if there's any questions. If you find our project useful in your research or application development, citing our paper would be the best support for us!
63
+
64
+ ```bibtex
65
+ @misc{lin2025bifrost1bridgingmultimodalllms,
66
+ title={Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents},
67
+ author={Han Lin and Jaemin Cho and Amir Zadeh and Chuan Li and Mohit Bansal},
68
+ year={2025},
69
+ eprint={2508.05954},
70
+ archivePrefix={arXiv},
71
+ primaryClass={cs.CV},
72
+ url={https://arxiv.org/abs/2508.05954},
73
+ }
74
+ ```
75
+
76
+ ## ๐Ÿ™ Acknowledgements
77
+ The development of Bifrost-1 has been greatly inspired by the following amazing works and teams:
78
+
79
+ - [BLIP3o](https://github.com/JiuhaiChen/BLIP3o)
80
+ - [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL)
81
+ - [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
82
 
83
+ We hope that releasing this model/codebase helps the community to continue pushing these creative tools forward in an open and responsible way.