zhengli1013's picture
Upload folder using huggingface_hub
79aa5ff verified
metadata
base_model:
  - Qwen/Qwen3-VL-8B-Instruct
datasets:
  - InterleaveThinker/Train-Data
library_name: transformers
pipeline_tag: image-text-to-text
license: apache-2.0

InterleaveThinker-Critic Model

This repository contains the InterleaveThinker-Critic-8B model presented in InterleaveThinker: Reinforcing Agentic Interleaved Generation.

Project Page | GitHub Repository | Paper

👀 Intro

InterleaveThinker Teaser

We introduce InterleaveThinker, as the first multi-agent pipeline designed to endow any existing image generator with interleaved generation capabilities. InterleaveThinker can organize the image-text input sequence via a planner agent, evaluate generator outputs, identify deviations, and refine instructions via a critic agent, enabling complex interleaved text-image sequence generation for visual narratives, guidance, embodied manipulation and long-horizon sub-task annotation.

We build three dedicated training datasets—Interleave-Planner-SFT-80k, Interleave-Critic-SFT-112k, and Interleave-Critic-RL-13k—for interleaved generation and step-wise instruction correction using GRPO with proposed accuracy and step-wise rewards.

InterleaveThinker achieves performance comparable to Nano Banana and GPT-5 on interleaved generation benchmarks, delivering substantial gains on reasoning-based benchmarks (e.g., boosting WISE from 0.47 to 0.74 and RISE from 13.3 to 28.9 on 4-step FLUX.2-klein). It also demonstrates strong transferability, improving performance across various existing image generators.

🎥 Demo

Inference Process Example

Inference Process Example

For more examples, please refer to our website [🌐Project Page]

🚀 Training and Inference

For detailed instructions on setup, SFT/RL training, and inference, please refer to the official GitHub repository.

📐 Citation

If you find our work helpful for your research, please consider citing our work:

@article{zheng2026interleavethinker,
  title={InterleaveThinker: Reinforcing Agentic Interleaved Generation},
  author={Zheng, Dian and Li, Hongyu and Zhang, Manyuan and Feng, Kaituo and Li, Hongsheng},
  journal={},
  year={2026}
}