Update README.md

7ad87ba verified 11 months ago

4.84 kB

license: apache-2.0
base_model:
  - Qwen/Qwen2.5-7B-Instruct
pipeline_tag: any-to-any
library_name: bagel-mot

🥯 BAGEL: Unified Model for Multimodal Understanding and Generation

We present BAGEL, an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data.

BAGEL outperforms leading open‑source VLMs like Qwen2.5-VL and InternVL-2.5 on standard benchmarks and delivers text‑to‑image quality competitive with specialist generators such as SD3.

It supports:

Free-form visual manipulation
Multiview synthesis
World navigation
Advanced image editing beyond traditional models

🔧 Installation & Usage

Please refer to our GitHub Repository for:

Setup instructions
Example scripts
Demo usage

🧠 Method

BAGEL uses a Mixture-of-Transformer-Experts (MoT) architecture with:

Dual encoders: capturing pixel-level and semantic-level features
Training objective: Next Group of Token Prediction
Vision token compression via FLUX.1 VAE

🌱 Emerging Properties

Performance improves as pretraining scales, progressing from:

Multimodal understanding
Generation
Basic image editing
Advanced multimodal reasoning and 3D/world modeling

📊 Benchmarks

🖼️ Visual Understanding

Model	MME ↑	MMBench ↑	MMMU ↑	MM-Vet ↑	MathVista ↑
Janus-Pro-7B	–	79.2	41.0	50.0	–
Qwen2.5-VL-7B	2347	83.5	58.6	67.1	68.2
BAGEL	2388	85.0	55.3	67.2	73.1

🖌️ Text-to-Image Generation (GenEval)

Model	Overall ↑
FLUX-1-dev	0.82
SD3-Medium	0.74
Janus-Pro-7B	0.80
BAGEL	0.88

🪄 Image Editing

Model	GEdit-Bench-EN (SC) ↑	GEdit-Bench-EN (PQ) ↑	GEdit-Bench-EN (O) ↑	IntelligentBench ↑
Step1X-Edit	7.09	6.76	6.70	14.9
Gemini-2-exp.	6.73	6.61	6.32	57.6
BAGEL	7.36	6.83	6.52	44.0
BAGEL+CoT	–	–	–	55.3

⚖️ License

BAGEL is licensed under the Apache 2.0 License.

Finetuned from:

📚 Citation

@article{deng2025bagel,
  title   = {Emerging Properties in Unified Multimodal Pretraining},
  author  = {Deng, Chaorui and Zhu, Deyao and Li, Kunchang and Gou, Chenhui and Li, Feng and Wang, Zeyu and Zhong, Shu and Yu, Weihao and Nie, Xiaonan and Song, Ziang and Shi, Guang and Fan, Haoqi},
  journal = {arXiv preprint arXiv:2505.14683},
  year    = {2025}
}