Configuration Parsing Warning:Invalid JSON for config file config.json

🦄 UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

🧐 What is UniCorn?

While Unified Multimodal Models (UMMs) excel at comprehension, they often suffer from Conduction Aphasia: the inability to translate internal knowledge into faithful generation.

UniCorn is a simple yet elegant self-improvement framework that eliminates the need for external data or teacher supervision. It partitions a single UMM into three collaborative roles—Proposer, Solver, and Judge—to distill latent understanding into explicit generative signals via self-play.

🌟 Key Features

Self-Generated Supervision: No external labels or teacher models required.
Cognitive Pattern Reconstruction: Bridges the gap between multimodal "understanding" and "synthesis."
UniCycle Benchmark: A new cycle-consistency metric (Text ↔ Image ↔ Text) to validate multimodal coherence.
SOTA Performance: Leading results on TIIF (73.8), DPG (86.8), and CompBench (88.5).

🚀 Quick Start

Inference & Best Practices

To optimize generation quality and avoid common pitfalls like blurriness, follow these hyperparameter guidelines:

cfg_text_scale: Use 4.0–8.0 for balanced prompt following.
cfg_renorm_type: Use global for general Text-to-Image tasks.
timestep_shift: Higher values for better layout; lower values for finer details.
num_timesteps: Standard setting is 50.

📊 Results

UniCorn achieves substantial gains over base models (e.g., +6.5 on OneIG, +5.0 on WISE).

Model	TIIF (Short/Long)	WISE (Overall)	OneIG-EN (Overall)	CompBench (Overall)	DPG (Score)	Geneval (Score)
BAGEL	71.0 / 71.8	50.0	36.1	82.2	84.0	78.0
UniCorn	74.7 / 72.9	55.0	42.6	88.5	86.8	82.0
$\Delta$(vs. BAGEL)	+3.7 / +1.1	+5.0	+6.5	+6.3	+2.8	+4.0

📢 News & Roadmap

Jan. 12, 2026: Released model checkpoints.
Jan. 07, 2026: Released official Arxiv Report.
To-Do: Release full training and evaluation code.

✍️ Citation

@article{han2026unicorn,
  title={UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision},
  author={Han, Ruiyan and Fang, Zhen and Sun, Xinyu and Ma, Yuchen and Wang, Ziheng and Zeng, Yu and Chen, Zehui and Chen, Lin and Huang, Wenxuan and Xu, Wei-Jie and others},
  journal={arXiv preprint arXiv:2601.03193},
  year={2026}
}