# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="InterleaveThinker/InterleaveThinker-Planner-8B")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages)# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("InterleaveThinker/InterleaveThinker-Planner-8B")
model = AutoModelForImageTextToText.from_pretrained("InterleaveThinker/InterleaveThinker-Planner-8B")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))InterleaveThinker-Planner Model
This repository contains the InterleaveThinker-Planner-8B model presented in InterleaveThinker: Reinforcing Agentic Interleaved Generation.
Project Page | GitHub Repository | Paper
👀 Intro
We introduce InterleaveThinker, as the first multi-agent pipeline designed to endow any existing image generator with interleaved generation capabilities. InterleaveThinker can organize the image-text input sequence via a planner agent, evaluate generator outputs, identify deviations, and refine instructions via a critic agent, enabling complex interleaved text-image sequence generation for visual narratives, guidance, embodied manipulation and long-horizon sub-task annotation.
We build three dedicated training datasets—Interleave-Planner-SFT-80k, Interleave-Critic-SFT-112k, and Interleave-Critic-RL-13k—for interleaved generation and step-wise instruction correction using GRPO with proposed accuracy and step-wise rewards.
InterleaveThinker achieves performance comparable to Nano Banana and GPT-5 on interleaved generation benchmarks, delivering substantial gains on reasoning-based benchmarks (e.g., boosting WISE from 0.47 to 0.74 and RISE from 13.3 to 28.9 on 4-step FLUX.2-klein). It also demonstrates strong transferability, improving performance across various existing image generators.
🎥 Demo
Inference Process Example
For more examples, please refer to our website [🌐Project Page]
🚀 Training and Inference
For detailed instructions on setup, SFT/RL training, and inference, please refer to the official GitHub repository.
📐 Citation
If you find our work helpful for your research, please consider citing our work:
@article{zheng2026interleavethinker,
title={InterleaveThinker: Reinforcing Agentic Interleaved Generation},
author={Zheng, Dian and Li, Hongyu and Zhang, Manyuan and Feng, Kaituo and Li, Hongsheng},
journal={},
year={2026}
}
- Downloads last month
- -
Model tree for InterleaveThinker/InterleaveThinker-Planner-8B
Base model
Qwen/Qwen3-VL-8B-Instruct
# Gated model: Login with a HF token with gated access permission hf auth login