Instructions to use nvidia/Cosmos3-Super-Text2Image with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use nvidia/Cosmos3-Super-Text2Image with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Diffusers
How to use nvidia/Cosmos3-Super-Text2Image with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/Cosmos3-Super-Text2Image", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Agentic Prompt Upsampling
This repository includes a standalone text-to-image agentic prompt upsampler for Cosmos3-Super-Text2Image.
The loop:
- Upsamples the user prompt into a structured Cosmos3 T2I JSON prompt.
- Generates an image through a vLLM-Omni
/v1/images/generationsendpoint. - Scores the image with a VLM critic.
- Rewrites both the positive JSON prompt and generator-side negative prompt from the critic feedback.
- Repeats up to the configured iteration limit and returns the best scored image.
Install
From the repository root:
python -m pip install requests pillow
Recommended vLLM-Omni serving configuration for nvidia/Cosmos3-Super-Text2Image on 4xH200 is:
vllm serve nvidia/Cosmos3-Super-Text2Image \
--omni \
--cfg-parallel-size 2 \
--ulysses-degree 2 \
--tensor-parallel-size 1
With the no-offload configuration above, 1024x1024 image generation with 50 steps is expected to take roughly 5 seconds server-side per request.
Default Models
The default prompt upsampler and rewriter are OpenAI GPT-5.5 through the public OpenAI chat completions API:
endpoint: https://api.openai.com/v1
model: gpt-5.5
extra body: {"reasoning_effort": "low"}
env var: OPENAI_API_KEY
The default critic is Gemini 3.1 Pro Preview through Google's OpenAI-compatible chat completions endpoint:
endpoint: https://generativelanguage.googleapis.com/v1beta/openai/
model: gemini-3.1-pro-preview
env var: GEMINI_API_KEY
Set credentials:
export OPENAI_API_KEY=...
export GEMINI_API_KEY=...
If your vLLM-Omni generation endpoint requires auth:
export AGENTIC_UPSAMPLING_GENERATION_AUTH_KEY=...
Run One Prompt
python -m agentic_upsampling.run \
--prompt "a cinematic photo of a glass greenhouse at sunrise" \
--output-dir outputs/agentic_greenhouse \
--generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT
The generation call is a standard vLLM-Omni image request:
POST /v1/images/generations
model: nvidia/Cosmos3-Super-Text2Image
size: 1024x1024
response_format: b64_json
num_inference_steps: 50
guidance_scale: 4.0
flow_shift: 3.0
negative_prompt: ""
extra_args: {"guardrails": false, "use_resolution_template": false}
Run A Batch
Text file, one prompt per non-empty line:
python -m agentic_upsampling.run \
--prompts prompts.txt \
--output-dir outputs/agentic_batch \
--generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT
JSONL rows can be strings or objects with prompt and optional id:
{"id": "greenhouse", "prompt": "a glass greenhouse at sunrise"}
{"id": "city", "prompt": "a clean futuristic city plaza after rain"}
CSV files must include a prompt or Prompt column and may include an id column.
Useful Options
python -m agentic_upsampling.run \
--prompt "a precise product photo of a transparent mechanical keyboard" \
--output-dir outputs/keyboard \
--generation-endpoint https://YOUR_VLLM_OMNI_ENDPOINT \
--max-iterations 2 \
--samples-per-iteration 3 \
--seed-base 42 \
--size 1024x1024 \
--guidance 4.0 \
--flow-shift 3.0
--max-iterationscontrols total prompt stages. The default is2, meaning the initial upsample plus up to two rewrites.--samples-per-iterationruns a best-of-N seed search for each prompt stage. Generation requests for those seeds are submitted concurrently within the iteration.--seed-basemakes seeds deterministic. Sample seeds areseed_base + sample_index.--sizeis the vLLM-Omni image size inWIDTHxHEIGHTformat.--guidancesetsguidance_scale; the default is4.0.--flow-shiftsetsflow_shift; the default is3.0.--generation-extra-argsoverrides the default vLLM-Omni generationextra_argsJSON object.- Early stopping is enabled by default when the critic score clears the strict threshold. Use
--disable-early-stopto always run every iteration. - Reruns resume from completed artifacts by default. Use
--overwriteto regenerate them.
Output Layout
output_dir/
run_config.json
summary.json
manifest.jsonl
failures.jsonl
0001/
best.json
iter_00/
prompt.json
negative_prompt.json
image.jpg
generation_meta.json
analysis.json
samples.json
meta.json
iter_01/
...
For --samples-per-iteration N, each iteration contains sample_00/, sample_01/, and so on.
Export Best Images
Copy the selected best image for every completed prompt into one folder:
python -m agentic_upsampling.extract_best \
--output-dir outputs/agentic_batch \
--export-dir outputs/agentic_batch_best \
--overwrite
The exporter writes:
best_generations.jsonl
best_generations.csv
images/