interleavethinker / README.md
multimodalart's picture
multimodalart HF Staff
Restore to build-1 final state (bbc2e650) -- the tested working demo; cancelled the redundant rebuild that had started over it
f615c58 verified
|
Raw
History Blame Contribute Delete
1.12 kB
metadata
title: InterleaveThinker
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.19.0
app_file: app.py
short_description: Multi-agent interleaved text-image generation pipeline
python_version: '3.12'
startup_duration_timeout: 1h

InterleaveThinker: Reinforcing Agentic Interleaved Generation

This Space demonstrates InterleaveThinker, a multi-agent pipeline that endows any image generator with interleaved generation capabilities. It orchestrates a Planner agent and a Critic agent around the FLUX.2-klein image generator to produce step-by-step visual narratives with interleaved text and images.

How it works

  1. Planner (InterleaveThinker-Planner-8B, Qwen3VL) analyzes your prompt and generates a structured execution plan.
  2. For each step, FLUX.2-klein-9B generates or edits the image.
  3. Critic (Critic-SFT-8B, Qwen3VL) evaluates the generated image and refines the prompt if needed.
  4. The result is an interleaved sequence of text and images.

Paper

InterleaveThinker: Reinforcing Agentic Interleaved Generation