Configuration Parsing Warning: Invalid JSON for config file config.json
π¦ UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
π§ What is UniCorn?
While Unified Multimodal Models (UMMs) excel at comprehension, they often suffer from Conduction Aphasia: the inability to translate internal knowledge into faithful generation.
UniCorn is a simple yet elegant self-improvement framework that eliminates the need for external data or teacher supervision. It partitions a single UMM into three collaborative rolesβProposer, Solver, and Judgeβto distill latent understanding into explicit generative signals via self-play.
π Key Features
- Self-Generated Supervision: No external labels or teacher models required.
- Cognitive Pattern Reconstruction: Bridges the gap between multimodal "understanding" and "synthesis."
- UniCycle Benchmark: A new cycle-consistency metric (Text β Image β Text) to validate multimodal coherence.
- SOTA Performance: Leading results on TIIF (73.8), DPG (86.8), and CompBench (88.5).
π Quick Start
Inference & Best Practices
To optimize generation quality and avoid common pitfalls like blurriness, follow these hyperparameter guidelines:
cfg_text_scale: Use4.0β8.0for balanced prompt following.cfg_renorm_type: Useglobalfor general Text-to-Image tasks.timestep_shift: Higher values for better layout; lower values for finer details.num_timesteps: Standard setting is50.
π Results
UniCorn achieves substantial gains over base models (e.g., +6.5 on OneIG, +5.0 on WISE).
| Model | TIIF (Short/Long) | WISE (Overall) | OneIG-EN (Overall) | CompBench (Overall) | DPG (Score) | Geneval (Score) |
|---|---|---|---|---|---|---|
| BAGEL | 71.0 / 71.8 | 50.0 | 36.1 | 82.2 | 84.0 | 78.0 |
| UniCorn | 74.7 / 72.9 | 55.0 | 42.6 | 88.5 | 86.8 | 82.0 |
| $\Delta$(vs. BAGEL) | +3.7 / +1.1 | +5.0 | +6.5 | +6.3 | +2.8 | +4.0 |
π’ News & Roadmap
- Jan. 12, 2026: Released model checkpoints.
- Jan. 07, 2026: Released official Arxiv Report.
- To-Do: Release full training and evaluation code.
βοΈ Citation
@article{han2026unicorn,
title={UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision},
author={Han, Ruiyan and Fang, Zhen and Sun, Xinyu and Ma, Yuchen and Wang, Ziheng and Zeng, Yu and Chen, Zehui and Chen, Lin and Huang, Wenxuan and Xu, Wei-Jie and others},
journal={arXiv preprint arXiv:2601.03193},
year={2026}
}
π License
This project is licensed under the Apache 2.0 License.
- Downloads last month
- 8