Instructions to use nvidia/Cosmos3-Super-Text2Image with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use nvidia/Cosmos3-Super-Text2Image with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Diffusers
How to use nvidia/Cosmos3-Super-Text2Image with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/Cosmos3-Super-Text2Image", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| # Agentic Prompt Upsampling | |
| This repository includes a standalone text-to-image agentic prompt upsampler for Cosmos3-Super-Text2Image. | |
| The loop: | |
| 1. Upsamples the user prompt into a structured Cosmos3 T2I JSON prompt. | |
| 2. Generates an image through a vLLM-Omni `/v1/images/generations` endpoint. | |
| 3. Scores the image with a VLM critic. | |
| 4. Rewrites both the positive JSON prompt and generator-side negative prompt from the critic feedback. | |
| 5. Repeats up to the configured iteration limit and returns the best scored image. | |
| ## Install | |
| From the repository root: | |
| ```bash | |
| python -m pip install requests pillow | |
| ``` | |
| Recommended vLLM-Omni serving configuration for `nvidia/Cosmos3-Super-Text2Image` on 4xH200 is: | |
| ```bash | |
| vllm serve nvidia/Cosmos3-Super-Text2Image \ | |
| --omni \ | |
| --cfg-parallel-size 2 \ | |
| --ulysses-degree 2 \ | |
| --tensor-parallel-size 1 | |
| ``` | |
| With the no-offload configuration above, 1024x1024 image generation with 50 steps is expected to take roughly 5 seconds server-side per request. | |
| ## Default Models | |
| The default prompt upsampler and rewriter are OpenAI GPT-5.5 through the public OpenAI chat completions API: | |
| ```text | |
| endpoint: https://api.openai.com/v1 | |
| model: gpt-5.5 | |
| extra body: {"reasoning_effort": "low"} | |
| env var: OPENAI_API_KEY | |
| ``` | |
| The default critic is Gemini 3.1 Pro Preview through Google's OpenAI-compatible chat completions endpoint: | |
| ```text | |
| endpoint: https://generativelanguage.googleapis.com/v1beta/openai/ | |
| model: gemini-3.1-pro-preview | |
| env var: GEMINI_API_KEY | |
| ``` | |
| Set credentials: | |
| ```bash | |
| export OPENAI_API_KEY=... | |
| export GEMINI_API_KEY=... | |
| ``` | |
| If your vLLM-Omni generation endpoint requires auth: | |
| ```bash | |
| export AGENTIC_UPSAMPLING_GENERATION_AUTH_KEY=... | |
| ``` | |
| ## Run One Prompt | |
| ```bash | |
| python -m agentic_upsampling.run \ | |
| --prompt "a cinematic photo of a glass greenhouse at sunrise" \ | |
| --output-dir outputs/agentic_greenhouse \ | |
| --generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT | |
| ``` | |
| The generation call is a standard vLLM-Omni image request: | |
| ```text | |
| POST /v1/images/generations | |
| model: nvidia/Cosmos3-Super-Text2Image | |
| size: 1024x1024 | |
| response_format: b64_json | |
| num_inference_steps: 50 | |
| guidance_scale: 4.0 | |
| flow_shift: 3.0 | |
| negative_prompt: "" | |
| extra_args: {"guardrails": false, "use_resolution_template": false} | |
| ``` | |
| ## Run A Batch | |
| Text file, one prompt per non-empty line: | |
| ```bash | |
| python -m agentic_upsampling.run \ | |
| --prompts prompts.txt \ | |
| --output-dir outputs/agentic_batch \ | |
| --generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT | |
| ``` | |
| JSONL rows can be strings or objects with `prompt` and optional `id`: | |
| ```json | |
| {"id": "greenhouse", "prompt": "a glass greenhouse at sunrise"} | |
| {"id": "city", "prompt": "a clean futuristic city plaza after rain"} | |
| ``` | |
| CSV files must include a `prompt` or `Prompt` column and may include an `id` column. | |
| ## Useful Options | |
| ```bash | |
| python -m agentic_upsampling.run \ | |
| --prompt "a precise product photo of a transparent mechanical keyboard" \ | |
| --output-dir outputs/keyboard \ | |
| --generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT \ | |
| --max-iterations 2 \ | |
| --samples-per-iteration 3 \ | |
| --seed-base 42 \ | |
| --size 1024x1024 \ | |
| --guidance 4.0 \ | |
| --flow-shift 3.0 | |
| ``` | |
| - `--max-iterations` controls total prompt stages. The default is `2`, meaning the initial upsample plus up to two rewrites. | |
| - `--samples-per-iteration` runs a best-of-N seed search for each prompt stage. Generation requests for those seeds are submitted concurrently within the iteration. | |
| - `--seed-base` makes seeds deterministic. Sample seeds are `seed_base + sample_index`. | |
| - `--size` is the vLLM-Omni image size in `WIDTHxHEIGHT` format. | |
| - `--guidance` sets `guidance_scale`; the default is `4.0`. | |
| - `--flow-shift` sets `flow_shift`; the default is `3.0`. | |
| - `--generation-extra-args` overrides the default vLLM-Omni generation `extra_args` JSON object. | |
| - Early stopping is enabled by default when the critic score clears the strict threshold. Use `--disable-early-stop` to always run every iteration. | |
| - Reruns resume from completed artifacts by default. Use `--overwrite` to regenerate them. | |
| ## Output Layout | |
| ```text | |
| output_dir/ | |
| run_config.json | |
| summary.json | |
| manifest.jsonl | |
| failures.jsonl | |
| 0001/ | |
| best.json | |
| iter_00/ | |
| prompt.json | |
| negative_prompt.json | |
| image.jpg | |
| generation_meta.json | |
| analysis.json | |
| samples.json | |
| meta.json | |
| iter_01/ | |
| ... | |
| ``` | |
| For `--samples-per-iteration N`, each iteration contains `sample_00/`, `sample_01/`, and so on. | |
| ## Export Best Images | |
| Copy the selected best image for every completed prompt into one folder: | |
| ```bash | |
| python -m agentic_upsampling.extract_best \ | |
| --output-dir outputs/agentic_batch \ | |
| --export-dir outputs/agentic_batch_best \ | |
| --overwrite | |
| ``` | |
| The exporter writes: | |
| ```text | |
| best_generations.jsonl | |
| best_generations.csv | |
| images/ | |
| ``` | |