--- license: apache-2.0 tags: - video-to-video - sim2real - synthetic-data - surveillance - cosmos - nvidia - docker - rest-api - diffusion pipeline_tag: text-to-video --- # cosmos-transfer 🎬 A **REST API wrapper** around [NVIDIA Cosmos-Transfer2.5](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) — a 2B parameter video diffusion model that converts synthetic renders into photorealistic video (Sim2Real). Packaged as a ready-to-run Docker microservice with battle-tested parameters tuned across 80+ surveillance clips. ## Quick Start ```bash docker pull ghcr.io/eyalenav/cosmos-transfer:latest docker run --rm --gpus '"device=0"' -p 8080:8080 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -e HUGGINGFACE_TOKEN=hf_... \ ghcr.io/eyalenav/cosmos-transfer:latest ``` > ⚠️ First run downloads Cosmos-Transfer2.5-2B weights (~20GB). Requires a HuggingFace token with access to `nvidia/Cosmos-Transfer2.5-2B`. ## API ### `POST /transfer` Convert a synthetic video to photorealistic. ```bash curl -X POST http://localhost:8080/transfer \ -F "video=@synthetic_render.mp4" \ -F "prompt=surveillance camera footage of a crowded street" \ --output photorealistic.mp4 ``` **Parameters:** | Field | Default | Description | |---|---|---| | `video` | required | Input synthetic MP4 | | `prompt` | `""` | Text guidance for the scene | | `edge_strength` | `0.85` | Canny edge control (geometry preservation) | | `vis_strength` | `0.45` | Visual blur control (scene structure) | | `sigma` | `100` | Noise level (realism vs. fidelity) | ### `GET /health` ```bash curl http://localhost:8080/health # {"status": "ok"} ``` ## Tuned Parameters After 80+ clips, the sweet spot for **surveillance synthetic data**: ``` edge=0.85 + vis=0.45 + sigma=100 ``` - **edge 0.85** — strong geometry/silhouette preservation from Canny - **vis 0.45** — moderate scene structure preservation - **sigma 100** — balanced realism without losing the synthetic layout ## Requirements | Resource | Minimum | |---|---| | GPU | A100 / RTX 6000 Ada / H100 | | VRAM | 40 GB | | RAM | 64 GB | | Disk | 30 GB (model weights) | ## Part of VisionAI-Flywheel This service is one component of a full synthetic surveillance data pipeline: ``` [kimodo-api] → NPZ motion ↓ [render-api] → SOMA mesh render (MP4) ↓ [cosmos-transfer] → Sim2Real photorealistic video ← this image ↓ [NVIDIA VSS] → VLM annotation → fine-tuning dataset ``` 🔗 Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel) ## License Apache 2.0 — see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE) > Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.