| --- |
| base_model: |
| - Qwen/Qwen3-VL-8B-Instruct |
| datasets: |
| - InterleaveThinker/Train-Data |
| library_name: transformers |
| pipeline_tag: image-text-to-text |
| license: apache-2.0 |
| --- |
| |
| # InterleaveThinker-Critic Model |
|
|
| This repository contains the InterleaveThinker-Critic-8B model presented in [InterleaveThinker: Reinforcing Agentic Interleaved Generation](). |
|
|
| [**Project Page**]() | [**GitHub Repository**](https://github.com/zhengdian1/InterleaveThinker) | [**Paper**]() |
|
|
| # 👀 Intro |
|
|
| <div align="center"> |
| <img src="https://github.com/zhengdian1/InterleaveThinker/blob/main/assets/teaser.jpg?raw=true" alt="InterleaveThinker Teaser" width="80%"> |
| </div> |
|
|
| We introduce **InterleaveThinker**, as the first multi-agent pipeline designed to **endow any existing image generator with interleaved generation capabilities**. InterleaveThinker can organize the image-text input sequence via a planner agent, evaluate generator outputs, identify deviations, and refine instructions via a critic agent, **enabling complex interleaved text-image sequence generation for visual narratives, guidance, embodied manipulation and long-horizon sub-task annotation.** |
|
|
| We build three dedicated training datasets—Interleave-Planner-SFT-80k, Interleave-Critic-SFT-112k, and Interleave-Critic-RL-13k—for interleaved generation and step-wise instruction correction using GRPO with proposed accuracy and step-wise rewards. |
|
|
| InterleaveThinker achieves **performance comparable to Nano Banana and GPT-5 on interleaved generation benchmarks**, delivering substantial gains on reasoning-based benchmarks (e.g., boosting WISE from 0.47 to 0.74 and RISE from 13.3 to 28.9 on 4-step FLUX.2-klein). It also demonstrates strong transferability, improving performance across various existing image generators. |
|
|
|
|
| ## 🎥 Demo |
|
|
| #### Inference Process Example |
|
|
| <div align="center"> |
| <img src="https://github.com/zhengdian1/InterleaveThinker/blob/main/assets/example.jpg?raw=true" alt="Inference Process Example" width="85%"> |
| </div> |
|
|
| For more examples, please refer to our website [[🌐Project Page]]() |
|
|
| ## 🚀 Training and Inference |
|
|
| For detailed instructions on setup, SFT/RL training, and inference, please refer to the [official GitHub repository](https://github.com/zhengdian1/InterleaveThinker). |
|
|
| ## 📐 Citation |
|
|
| If you find our work helpful for your research, please consider citing our work: |
|
|
| ```bibtex |
| @article{zheng2026interleavethinker, |
| title={InterleaveThinker: Reinforcing Agentic Interleaved Generation}, |
| author={Zheng, Dian and Li, Hongyu and Zhang, Manyuan and Feng, Kaituo and Li, Hongsheng}, |
| journal={}, |
| year={2026} |
| } |
| ``` |