zhengli1013's picture
Upload folder using huggingface_hub
79aa5ff verified
---
base_model:
- Qwen/Qwen3-VL-8B-Instruct
datasets:
- InterleaveThinker/Train-Data
library_name: transformers
pipeline_tag: image-text-to-text
license: apache-2.0
---
# InterleaveThinker-Critic Model
This repository contains the InterleaveThinker-Critic-8B model presented in [InterleaveThinker: Reinforcing Agentic Interleaved Generation]().
[**Project Page**]() | [**GitHub Repository**](https://github.com/zhengdian1/InterleaveThinker) | [**Paper**]()
# 👀 Intro
<div align="center">
<img src="https://github.com/zhengdian1/InterleaveThinker/blob/main/assets/teaser.jpg?raw=true" alt="InterleaveThinker Teaser" width="80%">
</div>
We introduce **InterleaveThinker**, as the first multi-agent pipeline designed to **endow any existing image generator with interleaved generation capabilities**. InterleaveThinker can organize the image-text input sequence via a planner agent, evaluate generator outputs, identify deviations, and refine instructions via a critic agent, **enabling complex interleaved text-image sequence generation for visual narratives, guidance, embodied manipulation and long-horizon sub-task annotation.**
We build three dedicated training datasets—Interleave-Planner-SFT-80k, Interleave-Critic-SFT-112k, and Interleave-Critic-RL-13k—for interleaved generation and step-wise instruction correction using GRPO with proposed accuracy and step-wise rewards.
InterleaveThinker achieves **performance comparable to Nano Banana and GPT-5 on interleaved generation benchmarks**, delivering substantial gains on reasoning-based benchmarks (e.g., boosting WISE from 0.47 to 0.74 and RISE from 13.3 to 28.9 on 4-step FLUX.2-klein). It also demonstrates strong transferability, improving performance across various existing image generators.
## 🎥 Demo
#### Inference Process Example
<div align="center">
<img src="https://github.com/zhengdian1/InterleaveThinker/blob/main/assets/example.jpg?raw=true" alt="Inference Process Example" width="85%">
</div>
For more examples, please refer to our website [[🌐Project Page]]()
## 🚀 Training and Inference
For detailed instructions on setup, SFT/RL training, and inference, please refer to the [official GitHub repository](https://github.com/zhengdian1/InterleaveThinker).
## 📐 Citation
If you find our work helpful for your research, please consider citing our work:
```bibtex
@article{zheng2026interleavethinker,
title={InterleaveThinker: Reinforcing Agentic Interleaved Generation},
author={Zheng, Dian and Li, Hongyu and Zhang, Manyuan and Feng, Kaituo and Li, Hongsheng},
journal={},
year={2026}
}
```