interleavethinker / README.md
multimodalart's picture
multimodalart HF Staff
Restore to build-1 final state (bbc2e650) -- the tested working demo; cancelled the redundant rebuild that had started over it
f615c58 verified
|
Raw
History Blame Contribute Delete
1.12 kB
---
title: InterleaveThinker
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.19.0
app_file: app.py
short_description: Multi-agent interleaved text-image generation pipeline
python_version: "3.12"
startup_duration_timeout: 1h
---
## InterleaveThinker: Reinforcing Agentic Interleaved Generation
This Space demonstrates **InterleaveThinker**, a multi-agent pipeline that endows any image generator with interleaved generation capabilities. It orchestrates a **Planner** agent and a **Critic** agent around the **FLUX.2-klein** image generator to produce step-by-step visual narratives with interleaved text and images.
### How it works
1. **Planner** (InterleaveThinker-Planner-8B, Qwen3VL) analyzes your prompt and generates a structured execution plan.
2. For each step, **FLUX.2-klein-9B** generates or edits the image.
3. **Critic** (Critic-SFT-8B, Qwen3VL) evaluates the generated image and refines the prompt if needed.
4. The result is an interleaved sequence of text and images.
### Paper
[InterleaveThinker: Reinforcing Agentic Interleaved Generation](https://arxiv.org/abs/2606.13679)