Yanran21
/

UniGenDet

+---
+license: apache-2.0
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- ByteDance-Seed/BAGEL-7B-MoT
+pipeline_tag: text-to-image
+---
+---
+license: apache-2.0
+language:
+- en
+- zh
+tags:
+- text-to-image
+- fake-image-detection
+- unigendet
+- bagel
+---
+<h1 align="center">[CVPR 2026] UniGenDet: A Unified Generative-Discriminative Framework</h1>
+<p align="center">
+  <b>
+    <a href="https://github.com/Zhangyr2022/">Yanran Zhang</a>,
+    <a href="https://wzzheng.net/#">Wenzhao Zheng</a><sup>†</sup>,
+    <a href="https://joeleelyf.github.io/">Yifei Li</a>,
+    <a href="https://yuby14.github.io/">Bingyao Yu</a>,
+    <a href="https://yzheng97.github.io/">Yu Zheng</a>,
+    <a href="https://leichenthu.github.io/">Lei Chen</a>,
+    <a href="https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en">Jie Zhou</a><sup>*</sup>,
+    <a href="https://ivg.au.tsinghua.edu.cn/Jiwen_Lu/">Jiwen Lu</a>
+  </b>
+  <br/>
+  Department of Automation, Tsinghua University, China
+  <br/>
+  <sup>*</sup>Corresponding author &nbsp;&nbsp; <sup>†</sup>Project leader
+</p>
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/661cfae9a853782abad2a495/lBHJD1nNztgmdwc_WqVli.png" width="100%" alt="UniGenDet Teaser"/>
+</p>
+**UniGenDet** is a unified co-evolutionary framework that jointly optimizes image generation and generated-image detection in a single loop. By bridging generation and authenticity understanding through symbiotic multimodal self-attention, UniGenDet turns the traditional "generator vs. detector" arms race into a closed-loop collaboration.
+This repository hosts the fine-tuned model weights for UniGenDet.
+### 🔗 Links
+- **GitHub Repository (Code & Detailed Instructions):** [Zhangyr2022/UniGenDet](https://github.com/Zhangyr2022/UniGenDet)
+- **Paper (arXiv):** [2604.21904](https://arxiv.org/abs/2604.21904v1)
+- **Project Website:** [UniGenDet Project Page](https://ivg-yanranzhang.github.io/UniGenDet/)
+### 🚀 Getting Started
+The UniGenDet model supports two main tasks:
+1. **Text-to-Image Generation (`t2i`)**
+2. **AI-Generated Image Detection and Explanation (`detection`)**
+To use these weights for generation, detection, or further fine-tuning, please refer to the official [GitHub repository](https://github.com/Zhangyr2022/UniGenDet). The repository provides a comprehensive `demo.py` script for interactive inference.
+**Quick Inference Example Setup:**
+1. Clone the GitHub repository: `git clone https://github.com/Zhangyr2022/UniGenDet.git`
+2. Install dependencies as outlined in the repo's `README.md`.
+3. Download the base BAGEL pretrained assets.
+4. Run `demo.py` pointing to this Hugging Face model directory.
+For complete installation, data preparation, training (GDUF/DIGA), and evaluation instructions, please consult the [main GitHub repository](https://github.com/Zhangyr2022/UniGenDet).
+### Citation
+```bibtex
+@article{zhang2026unigendet,
+  title   = {UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection},
+  author  = {Zhang, Yanran and Zheng, Wenzhao and Li, Yifei and Yu, Bingyao and Zheng, Yu and Chen, Lei and Zhou, Jie and Lu, Jiwen},
+  journal = {CoRR},
+  volume  = {abs/2604.21904},
+  year    = {2026},
+  url     = {[https://arxiv.org/abs/2604.21904](https://arxiv.org/abs/2604.21904)},
+}