Image-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
agent
image-generation
tool-use
visual-reasoning
self-distillation
grpo
reinforcement-learning
multimodal
qwen3-vl
conversational
Instructions to use MeiGen-AI/GenEvolve with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MeiGen-AI/GenEvolve with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="MeiGen-AI/GenEvolve") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("MeiGen-AI/GenEvolve") model = AutoModelForImageTextToText.from_pretrained("MeiGen-AI/GenEvolve") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MeiGen-AI/GenEvolve with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MeiGen-AI/GenEvolve" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/MeiGen-AI/GenEvolve
- SGLang
How to use MeiGen-AI/GenEvolve with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MeiGen-AI/GenEvolve" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MeiGen-AI/GenEvolve" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MeiGen-AI/GenEvolve", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use MeiGen-AI/GenEvolve with Docker Model Runner:
docker model run hf.co/MeiGen-AI/GenEvolve
File size: 9,518 Bytes
22de288 e61d1d5 c893ed7 22de288 e61d1d5 b0f6799 e61d1d5 c893ed7 e61d1d5 c893ed7 e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a c893ed7 562355d e61d1d5 c893ed7 e61d1d5 562355d e61d1d5 562355d e61d1d5 562355d e61d1d5 562355d e61d1d5 562355d e61d1d5 562355d e61d1d5 c893ed7 562355d e61d1d5 562355d e61d1d5 432325a e61d1d5 432325a e61d1d5 432325a e61d1d5 c893ed7 e61d1d5 432325a e61d1d5 c7e493f e61d1d5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | ---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen3-VL-8B-Instruct
tags:
- agent
- image-generation
- tool-use
- visual-reasoning
- self-distillation
- grpo
- reinforcement-learning
- multimodal
- qwen3-vl
datasets:
- MeiGen-AI/GenEvolve-Data-Bench
---
<div align="center">
<img src="assets/logo_genevolve.png" alt="GenEvolve" width="160">
<h1>GenEvolve</h1>
<p><strong><em>Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation</em></strong></p>
<p>
<a href="https://arxiv.org/abs/2605.21605">
<img alt="Paper" src="https://img.shields.io/badge/π_Paper-arXiv:2605.21605-b31b1b"></a>
<a href="https://ephemeral182.github.io/GenEvolve/">
<img alt="Project Page" src="https://img.shields.io/badge/π_Project-Page-1f6feb"></a>
<a href="https://github.com/MeiGen-AI/GenEvolve">
<img alt="Code" src="https://img.shields.io/badge/πΎ_GitHub-Code-181717"></a>
<a href="https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data-Bench">
<img alt="Dataset" src="https://img.shields.io/badge/π€_Dataset-GenEvolve--Data-FFD21E"></a>
</p>
</div>
This repository hosts the **GenEvolve agent policy** β a Qwen3-VL-8B-Instruct backbone fine-tuned and self-evolved into a tool-orchestrated image-generation agent. Given a user request, the agent issues web/image searches, retrieves visual references, activates internal generation knowledge, and emits an executable **prompt-reference program** `z = (gen_prompt, reference_images)` that drives any reference-conditioned downstream generator (Qwen-Image-Edit, Nano Banana Pro, ...).
<div align="center">
<img src="assets/teaser.jpg" alt="GenEvolve teaser" width="100%">
<p><em>The same trained agent policy paired with two reference-conditioned generators βΆ<br>
<strong>Qwen-Image-Edit (open)</strong> Β· <strong>Nano Banana Pro (strong)</strong></em></p>
</div>
---
## β¨ Highlights
- **Tool-orchestrated trajectories.** The agent calls `search`, `image_search`, and `query_knowledge` (8 callable generation skills) before producing a final program `z = (gen_prompt, reference_images)`.
- **Self-evolution with Visual Experience Distillation.** Best-vs-worst trajectory pairs are distilled token-level into the deployed student. **No runtime memory at inference.**
- **Generator-transferable.** The same trained policy works with both an open-source generator (Qwen-Image-Edit-2511) and a strong proprietary generator (Nano Banana Pro).
## π Headline Results
### GenEvolve-Bench (KScore, held-out split)
| Method | Generator | KScore | Knowledge-Anch. | Quality-Anch. |
|---|---|---:|---:|---:|
| Qwen-Image (raw) | Qwen-Image | 0.2987 | 0.2384 | 0.3768 |
| Nano Banana Pro (raw) | Nano Banana Pro | 0.5298 | 0.5160 | 0.5477 |
| Gen-Searcher 8B | Qwen-Image-Edit-2511 | 0.3493 | 0.3293 | 0.3745 |
| Gen-Searcher 8B | Nano Banana Pro | 0.5481 | 0.5472 | 0.5492 |
| **GenEvolve (Ours)** | Qwen-Image-Edit-2511 | **0.3663** | **0.3410** | **0.3990** |
| **GenEvolve (Ours)** | Nano Banana Pro | **0.5739** | **0.5669** | **0.5830** |
### WISE Benchmark (WiScore, six knowledge categories)
| Model | Cultural | Time | Space | Biology | Physics | Chemistry | **Overall** |
|---|---:|---:|---:|---:|---:|---:|---:|
| GPT-4o | 0.81 | 0.71 | **0.89** | **0.83** | 0.79 | 0.74 | 0.80 |
| Gen-Searcher-8B + Qwen-Image | 0.80 | 0.71 | 0.82 | 0.76 | 0.74 | 0.75 | 0.77 |
| Mind-Brush | 0.83 | 0.69 | 0.84 | 0.71 | **0.85** | 0.68 | 0.78 |
| **GenEvolve + Qwen-Image-Edit** | **0.84** | 0.74 | 0.87 | **0.83** | 0.81 | **0.83** | **0.82** |
---
## π§ Method Overview
<p align="center"><img src="assets/overview.png" alt="GenEvolve method overview" width="92%"></p>
For a user request, the agent samples a multi-turn trajectory of tool calls before emitting the final prompt-reference program. The downstream generator then renders the image.
---
## πΌοΈ Visual Demos
<p align="center"><img src="assets/visual_comparison.png" alt="Qualitative comparison" width="100%"></p>
<p align="center"><sub>Qualitative comparison on representative cases. <span style="color:#D97706">Orange</span> marks external/uncommon knowledge requirements; <span style="color:#2563EB">blue</span> marks internal generation-knowledge requirements.</sub></p>
### π¨ Gallery β paired with Nano Banana Pro
<p align="center"><img src="assets/gallery_nano.jpg" alt="GenEvolve + Nano Banana Pro gallery" width="100%"></p>
<p align="center"><sub>The same agent policy with Nano Banana Pro as the downstream renderer. Examples cover spatial layout, text rendering, quantity counting, attribute binding, anatomy/pose, creative transfer, material physics, and aesthetic drawing.</sub></p>
### π¨ Gallery β paired with Qwen-Image-Edit (open)
<p align="center"><img src="assets/gallery_qwen.jpg" alt="GenEvolve + Qwen-Image-Edit gallery" width="100%"></p>
<p align="center"><sub>Same trained policy paired with the open-source Qwen-Image-Edit-2511 renderer; consistent quality across both generators reflects generator-transferable orchestration.</sub></p>
---
## π Quick Start
The deployed checkpoint is the **student policy** β it consumes a user prompt and returns a JSON `gen_prompt + reference_images` program through a `<think>/<tool_call>/<answer>` loop. The end-to-end runtime (vLLM serving + agent loop + tools + Qwen/Nano renderers) lives in the [GitHub repo](https://github.com/MeiGen-AI/GenEvolve); the snippet below mirrors its installation and usage.
### 1. Install the main GenEvolve runtime
```bash
git clone https://github.com/MeiGen-AI/GenEvolve.git
cd GenEvolve
conda create -n genevolve python=3.11 -y && conda activate genevolve
pip install -U pip setuptools wheel packaging psutil ninja
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install --no-build-isolation -r requirements.txt
pip install -e .
```
Qwen-Image-Edit rendering runs as a **separate FastAPI service** (kept out of the vLLM environment to avoid CUDA/diffusers conflicts). Set up that service from the GitHub README when you want to use `--backend qwen-image-edit-service`.
### 2. Serve the agent policy
```bash
# Single GPU / single replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh
# Higher throughput on one 8-GPU node (8 replicas, 1 GPU each).
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh
```
`TP` shards one model replica across multiple GPUs; `DP` launches multiple replicas; total GPU usage is `TP Γ DP`.
### 3. End-to-end example
```bash
export SERPER_API_KEY=<your_key> # required for search / image_search
export GOOGLE_API_KEY=<your_key> # or GEMINI_API_KEY; only for --backend nano-banana-pro
# Nano Banana Pro renderer
python examples/quickstart.py \
--backend nano-banana-pro \
--base-url http://localhost:8000/v1 \
--model GenEvolve \
--prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" in bold serif." \
--output paris.png
# Qwen-Image-Edit renderer (point at your Qwen-Image-Edit FastAPI service)
python examples/quickstart.py \
--backend qwen-image-edit-service \
--service-url http://your-qwen-service:8001 \
--base-url http://localhost:8000/v1 \
--model GenEvolve \
--output paris_qwen.png
```
The agent's final `<answer>` is a JSON object:
```json
{
"gen_prompt": "...natural-language prompt that refers to images by 'the first reference image', ...",
"reference_images": [
{"img_id": "IMG_001", "note": "what to copy from this image"}
]
}
```
`gen_prompt` MUST refer to selected images using ordinal phrases (`"the first reference image"`) β never raw `IMG_###` ids or URLs. Pass `(gen_prompt, [r["local_path"] for r in reference_images])` to your favourite reference-conditioned generator (Qwen-Image-Edit, Nano Banana Pro, ...) to obtain the final image.
---
## ποΈ Related Artifacts
| Artifact | Link |
|---|---|
| Project page | https://ephemeral182.github.io/GenEvolve/ |
| Paper | Coming soon |
| Code | https://github.com/MeiGen-AI/GenEvolve |
| Training data + benchmark | [MeiGen-AI/GenEvolve-Data-Bench](https://huggingface.co/datasets/MeiGen-AI/GenEvolve-Data-Bench) |
| Base model | [Qwen/Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
---
## βοΈ Intended Use, Limits, Bias
- **Intended use.** Research on tool-using image-generation agents, agentic prompt-program synthesis, and self-distillation from generated outcomes.
- **Search dependency.** The agent issues live web/image queries through user-provided tool wrappers. Quality of grounded facts depends on the search backend you plug in.
- **Bias.** Tool outputs and reference images come from public web search, which carries demographic, cultural, and geographic biases that may be reflected in agent outputs.
---
## π Citation
```bibtex
@misc{chen2026genevolveselfevolvingimagegeneration,
title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation},
author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
year={2026},
eprint={2605.21605},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.21605},
}
```
|