# Agentic Prompt Upsampling This repository includes a standalone text-to-image agentic prompt upsampler for Cosmos3-Super-Text2Image. The loop: 1. Upsamples the user prompt into a structured Cosmos3 T2I JSON prompt. 2. Generates an image through a vLLM-Omni `/v1/images/generations` endpoint. 3. Scores the image with a VLM critic. 4. Rewrites both the positive JSON prompt and generator-side negative prompt from the critic feedback. 5. Repeats up to the configured iteration limit and returns the best scored image. ## Install From the repository root: ```bash python -m pip install requests pillow ``` Recommended vLLM-Omni serving configuration for `nvidia/Cosmos3-Super-Text2Image` on 4xH200 is: ```bash vllm serve nvidia/Cosmos3-Super-Text2Image \ --omni \ --cfg-parallel-size 2 \ --ulysses-degree 2 \ --tensor-parallel-size 1 ``` With the no-offload configuration above, 1024x1024 image generation with 50 steps is expected to take roughly 5 seconds server-side per request. ## Default Models The default prompt upsampler and rewriter are OpenAI GPT-5.5 through the public OpenAI chat completions API: ```text endpoint: https://api.openai.com/v1 model: gpt-5.5 extra body: {"reasoning_effort": "low"} env var: OPENAI_API_KEY ``` The default critic is Gemini 3.1 Pro Preview through Google's OpenAI-compatible chat completions endpoint: ```text endpoint: https://generativelanguage.googleapis.com/v1beta/openai/ model: gemini-3.1-pro-preview env var: GEMINI_API_KEY ``` Set credentials: ```bash export OPENAI_API_KEY=... export GEMINI_API_KEY=... ``` If your vLLM-Omni generation endpoint requires auth: ```bash export AGENTIC_UPSAMPLING_GENERATION_AUTH_KEY=... ``` ## Run One Prompt ```bash python -m agentic_upsampling.run \ --prompt "a cinematic photo of a glass greenhouse at sunrise" \ --output-dir outputs/agentic_greenhouse \ --generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT ``` The generation call is a standard vLLM-Omni image request: ```text POST /v1/images/generations model: nvidia/Cosmos3-Super-Text2Image size: 1024x1024 response_format: b64_json num_inference_steps: 50 guidance_scale: 4.0 flow_shift: 3.0 negative_prompt: "" extra_args: {"guardrails": false, "use_resolution_template": false} ``` ## Run A Batch Text file, one prompt per non-empty line: ```bash python -m agentic_upsampling.run \ --prompts prompts.txt \ --output-dir outputs/agentic_batch \ --generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT ``` JSONL rows can be strings or objects with `prompt` and optional `id`: ```json {"id": "greenhouse", "prompt": "a glass greenhouse at sunrise"} {"id": "city", "prompt": "a clean futuristic city plaza after rain"} ``` CSV files must include a `prompt` or `Prompt` column and may include an `id` column. ## Useful Options ```bash python -m agentic_upsampling.run \ --prompt "a precise product photo of a transparent mechanical keyboard" \ --output-dir outputs/keyboard \ --generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT \ --max-iterations 2 \ --samples-per-iteration 3 \ --seed-base 42 \ --size 1024x1024 \ --guidance 4.0 \ --flow-shift 3.0 ``` - `--max-iterations` controls total prompt stages. The default is `2`, meaning the initial upsample plus up to two rewrites. - `--samples-per-iteration` runs a best-of-N seed search for each prompt stage. Generation requests for those seeds are submitted concurrently within the iteration. - `--seed-base` makes seeds deterministic. Sample seeds are `seed_base + sample_index`. - `--size` is the vLLM-Omni image size in `WIDTHxHEIGHT` format. - `--guidance` sets `guidance_scale`; the default is `4.0`. - `--flow-shift` sets `flow_shift`; the default is `3.0`. - `--generation-extra-args` overrides the default vLLM-Omni generation `extra_args` JSON object. - Early stopping is enabled by default when the critic score clears the strict threshold. Use `--disable-early-stop` to always run every iteration. - Reruns resume from completed artifacts by default. Use `--overwrite` to regenerate them. ## Output Layout ```text output_dir/ run_config.json summary.json manifest.jsonl failures.jsonl 0001/ best.json iter_00/ prompt.json negative_prompt.json image.jpg generation_meta.json analysis.json samples.json meta.json iter_01/ ... ``` For `--samples-per-iteration N`, each iteration contains `sample_00/`, `sample_01/`, and so on. ## Export Best Images Copy the selected best image for every completed prompt into one folder: ```bash python -m agentic_upsampling.extract_best \ --output-dir outputs/agentic_batch \ --export-dir outputs/agentic_batch_best \ --overwrite ``` The exporter writes: ```text best_generations.jsonl best_generations.csv images/ ```