Text-to-Video
Cosmos
video-to-video
sim2real
synthetic-data
surveillance
nvidia
docker
rest-api
diffusion
Instructions to use NVisionAI/cosmos-transfer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use NVisionAI/cosmos-transfer with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - video-to-video | |
| - sim2real | |
| - synthetic-data | |
| - surveillance | |
| - cosmos | |
| - nvidia | |
| - docker | |
| - rest-api | |
| - diffusion | |
| pipeline_tag: text-to-video | |
| # cosmos-transfer π¬ | |
| A **REST API wrapper** around [NVIDIA Cosmos-Transfer2.5](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) β a 2B parameter video diffusion model that converts synthetic renders into photorealistic video (Sim2Real). | |
| Packaged as a ready-to-run Docker microservice with battle-tested parameters tuned across 80+ surveillance clips. | |
| ## Quick Start | |
| ```bash | |
| docker pull ghcr.io/eyalenav/cosmos-transfer:latest | |
| docker run --rm --gpus '"device=0"' -p 8080:8080 \ | |
| -v ~/.cache/huggingface:/root/.cache/huggingface \ | |
| -e HUGGINGFACE_TOKEN=hf_... \ | |
| ghcr.io/eyalenav/cosmos-transfer:latest | |
| ``` | |
| > β οΈ First run downloads Cosmos-Transfer2.5-2B weights (~20GB). Requires a HuggingFace token with access to `nvidia/Cosmos-Transfer2.5-2B`. | |
| ## API | |
| ### `POST /transfer` | |
| Convert a synthetic video to photorealistic. | |
| ```bash | |
| curl -X POST http://localhost:8080/transfer \ | |
| -F "video=@synthetic_render.mp4" \ | |
| -F "prompt=surveillance camera footage of a crowded street" \ | |
| --output photorealistic.mp4 | |
| ``` | |
| **Parameters:** | |
| | Field | Default | Description | | |
| |---|---|---| | |
| | `video` | required | Input synthetic MP4 | | |
| | `prompt` | `""` | Text guidance for the scene | | |
| | `edge_strength` | `0.85` | Canny edge control (geometry preservation) | | |
| | `vis_strength` | `0.45` | Visual blur control (scene structure) | | |
| | `sigma` | `100` | Noise level (realism vs. fidelity) | | |
| ### `GET /health` | |
| ```bash | |
| curl http://localhost:8080/health | |
| # {"status": "ok"} | |
| ``` | |
| ## Tuned Parameters | |
| After 80+ clips, the sweet spot for **surveillance synthetic data**: | |
| ``` | |
| edge=0.85 + vis=0.45 + sigma=100 | |
| ``` | |
| - **edge 0.85** β strong geometry/silhouette preservation from Canny | |
| - **vis 0.45** β moderate scene structure preservation | |
| - **sigma 100** β balanced realism without losing the synthetic layout | |
| ## Requirements | |
| | Resource | Minimum | | |
| |---|---| | |
| | GPU | A100 / RTX 6000 Ada / H100 | | |
| | VRAM | 40 GB | | |
| | RAM | 64 GB | | |
| | Disk | 30 GB (model weights) | | |
| ## Part of VisionAI-Flywheel | |
| This service is one component of a full synthetic surveillance data pipeline: | |
| ``` | |
| [kimodo-api] β NPZ motion | |
| β | |
| [render-api] β SOMA mesh render (MP4) | |
| β | |
| [cosmos-transfer] β Sim2Real photorealistic video β this image | |
| β | |
| [NVIDIA VSS] β VLM annotation β fine-tuning dataset | |
| ``` | |
| π Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel) | |
| ## License | |
| Apache 2.0 β see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE) | |
| > Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image. | |