cosmos-transfer
REST API microservice wrapper around NVIDIA Cosmos-Transfer2.5-2B β a video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).
Installation
docker pull ghcr.io/eyalenav/cosmos-transfer:latest
Run
docker run --rm \
--gpus '"device=0"' \
-p 8080:8080 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e HUGGINGFACE_TOKEN=hf_... \
ghcr.io/eyalenav/cosmos-transfer:latest
First run: downloads Cosmos-Transfer2.5-2B weights (~20 GB). Subsequent starts are fast.
API Reference
GET /health
Check server status.
Request
GET http://localhost:8080/health
Response
{
"status": "ok",
"model": "Cosmos-Transfer2.5-2B",
"device": "cuda:0"
}
POST /transfer
Convert a synthetic video to photorealistic using multicontrol (edge + visual).
Request
POST http://localhost:8080/transfer
Content-Type: multipart/form-data
| Field | Type | Default | Description |
|---|---|---|---|
video |
file | required | Input synthetic MP4 (max 10s @ 24fps recommended) |
prompt |
string | "" |
Text describing the scene (improves realism) |
edge_strength |
float | 0.85 |
Canny edge control strength (geometry preservation) |
vis_strength |
float | 0.45 |
Visual/blur control strength (scene structure) |
sigma |
int | 100 |
Noise level β lower = more faithful, higher = more realistic |
num_steps |
int | 35 |
Diffusion steps (more = slower but higher quality) |
seed |
int | -1 |
Random seed (-1 = random) |
Response
Binary MP4 file (video/mp4).
Example
curl -X POST http://localhost:8080/transfer \
-F "video=@synthetic_render.mp4" \
-F "prompt=surveillance camera footage of a crowded urban street, overcast day" \
-F "edge_strength=0.85" \
-F "vis_strength=0.45" \
-F "sigma=100" \
--output photorealistic.mp4
POST /transfer_async
Submit a job and poll for completion (recommended for long clips).
Submit
curl -X POST http://localhost:8080/transfer_async \
-F "video=@render.mp4" \
-F "prompt=security incident, parking lot" \
-F "edge_strength=0.85" \
--output job.json
# {"job_id": "abc123", "status": "queued"}
Poll
curl http://localhost:8080/status/abc123
# {"job_id": "abc123", "status": "running", "progress": 0.42}
# ...
# {"job_id": "abc123", "status": "done"}
Download
curl http://localhost:8080/result/abc123 --output photorealistic.mp4
Tuned Parameters
Tested across 80+ surveillance clips β confirmed sweet spot:
edge_strength=0.85 + vis_strength=0.45 + sigma=100
| Parameter | Value | Effect |
|---|---|---|
edge_strength |
0.85 | Strong silhouette/geometry preservation from Canny edges |
vis_strength |
0.45 | Moderate scene structure via visual blur control |
sigma |
100 | Balanced noise β realistic textures without losing layout |
When to adjust
| Scenario | Adjustment |
|---|---|
| Subject drifts from synthetic pose | Increase edge_strength β 0.90β0.95 |
| Background too synthetic-looking | Increase vis_strength β 0.55β0.65 |
| Output too faithful to render colors | Increase sigma β 120 |
| Too much motion blur | Decrease sigma β 80 |
Hardware Requirements
| Resource | Minimum | Recommended |
|---|---|---|
| GPU | A100 40GB / RTX 6000 Ada | H100 / RTX PRO 6000 Blackwell |
| VRAM | 40 GB | 48+ GB |
| RAM | 64 GB | 128 GB |
| Disk | 30 GB | 50 GB |
| CUDA | 12.1+ | 12.8 |
Processing time (RTX PRO 6000 Blackwell, 96GB VRAM):
- 4s clip @ 24fps β ~3 min
- 10s clip @ 24fps β ~7 min
Environment Variables
| Variable | Required | Description |
|---|---|---|
HUGGINGFACE_TOKEN |
Yes | HF token with access to nvidia/Cosmos-Transfer2.5-2B |
CUDA_VISIBLE_DEVICES |
No | Limit to specific GPU (e.g. "1") |
PORT |
No | Override default port 8080 |
Integration with VisionAI-Flywheel
# docker-compose.yml excerpt
services:
cosmos-transfer:
image: ghcr.io/eyalenav/cosmos-transfer:latest
ports:
- "8080:8080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["1"]
capabilities: [gpu]
volumes:
- hf_cache:/root/.cache/huggingface
environment:
- HUGGINGFACE_TOKEN=${HUGGINGFACE_TOKEN}
Full docker-compose.yml: github.com/EyalEnav/VisionAI-Flywheel
Example: Full Python client
import requests
import time
def transfer_video(
input_path: str,
output_path: str,
prompt: str = "",
edge_strength: float = 0.85,
vis_strength: float = 0.45,
sigma: int = 100
):
"""Convert synthetic video to photorealistic."""
with open(input_path, "rb") as f:
response = requests.post(
"http://localhost:8080/transfer",
files={"video": ("input.mp4", f, "video/mp4")},
data={
"prompt": prompt,
"edge_strength": edge_strength,
"vis_strength": vis_strength,
"sigma": sigma,
},
timeout=600
)
response.raise_for_status()
with open(output_path, "wb") as f:
f.write(response.content)
print(f"Saved to {output_path}")
# Example usage
transfer_video(
input_path="soma_render.mp4",
output_path="photorealistic.mp4",
prompt="surveillance camera, urban street, daytime, overcast sky"
)
License
Apache 2.0
Cosmos-Transfer2.5 model weights are released under the NVIDIA Open Model License. Weights are downloaded at runtime and are not bundled in this image.