InterleaveThinker-Planner Model
This repository contains the InterleaveThinker-Planner-8B model presented in InterleaveThinker: Reinforcing Agentic Interleaved Generation.
Project Page | GitHub Repository | Paper
👀 Intro
We introduce InterleaveThinker, as the first multi-agent pipeline designed to endow any existing image generator with interleaved generation capabilities. InterleaveThinker can organize the image-text input sequence via a planner agent, evaluate generator outputs, identify deviations, and refine instructions via a critic agent, enabling complex interleaved text-image sequence generation for visual narratives, guidance, embodied manipulation and long-horizon sub-task annotation.
We build three dedicated training datasets—Interleave-Planner-SFT-80k, Interleave-Critic-SFT-112k, and Interleave-Critic-RL-13k—for interleaved generation and step-wise instruction correction using GRPO with proposed accuracy and step-wise rewards.
InterleaveThinker achieves performance comparable to Nano Banana and GPT-5 on interleaved generation benchmarks, delivering substantial gains on reasoning-based benchmarks (e.g., boosting WISE from 0.47 to 0.74 and RISE from 13.3 to 28.9 on 4-step FLUX.2-klein). It also demonstrates strong transferability, improving performance across various existing image generators.
🎥 Demo
Inference Process Example
For more examples, please refer to our website [🌐Project Page]
🚀 Training and Inference
For detailed instructions on setup, SFT/RL training, and inference, please refer to the official GitHub repository.
📐 Citation
If you find our work helpful for your research, please consider citing our work:
@article{zheng2026interleavethinker,
title={InterleaveThinker: Reinforcing Agentic Interleaved Generation},
author={Zheng, Dian and Li, Hongyu and Zhang, Manyuan and Feng, Kaituo and Li, Hongsheng},
journal={},
year={2026}
}
- Downloads last month
- -
Model tree for InterleaveThinker/InterleaveThinker-Planner-8B
Base model
Qwen/Qwen3-VL-8B-Instruct