Text-to-Video
Cosmos
video-to-video
sim2real
synthetic-data
surveillance
nvidia
docker
rest-api
diffusion
Instructions to use NVisionAI/cosmos-transfer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use NVisionAI/cosmos-transfer with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 2,928 Bytes
6fc95d9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | ---
license: apache-2.0
tags:
- video-to-video
- sim2real
- synthetic-data
- surveillance
- cosmos
- nvidia
- docker
- rest-api
- diffusion
pipeline_tag: text-to-video
---
# cosmos-transfer π¬
A **REST API wrapper** around [NVIDIA Cosmos-Transfer2.5](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) β a 2B parameter video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).
Packaged as a ready-to-run Docker microservice with battle-tested parameters tuned across 80+ surveillance clips.
## Quick Start
```bash
docker pull ghcr.io/eyalenav/cosmos-transfer:latest
docker run --rm --gpus '"device=0"' -p 8080:8080 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e HUGGINGFACE_TOKEN=hf_... \
ghcr.io/eyalenav/cosmos-transfer:latest
```
> β οΈ First run downloads Cosmos-Transfer2.5-2B weights (~20GB). Requires a HuggingFace token with access to `nvidia/Cosmos-Transfer2.5-2B`.
## API
### `POST /transfer`
Convert a synthetic video to photorealistic.
```bash
curl -X POST http://localhost:8080/transfer \
-F "video=@synthetic_render.mp4" \
-F "prompt=surveillance camera footage of a crowded street" \
--output photorealistic.mp4
```
**Parameters:**
| Field | Default | Description |
|---|---|---|
| `video` | required | Input synthetic MP4 |
| `prompt` | `""` | Text guidance for the scene |
| `edge_strength` | `0.85` | Canny edge control (geometry preservation) |
| `vis_strength` | `0.45` | Visual blur control (scene structure) |
| `sigma` | `100` | Noise level (realism vs. fidelity) |
### `GET /health`
```bash
curl http://localhost:8080/health
# {"status": "ok"}
```
## Tuned Parameters
After 80+ clips, the sweet spot for **surveillance synthetic data**:
```
edge=0.85 + vis=0.45 + sigma=100
```
- **edge 0.85** β strong geometry/silhouette preservation from Canny
- **vis 0.45** β moderate scene structure preservation
- **sigma 100** β balanced realism without losing the synthetic layout
## Requirements
| Resource | Minimum |
|---|---|
| GPU | A100 / RTX 6000 Ada / H100 |
| VRAM | 40 GB |
| RAM | 64 GB |
| Disk | 30 GB (model weights) |
## Part of VisionAI-Flywheel
This service is one component of a full synthetic surveillance data pipeline:
```
[kimodo-api] β NPZ motion
β
[render-api] β SOMA mesh render (MP4)
β
[cosmos-transfer] β Sim2Real photorealistic video β this image
β
[NVIDIA VSS] β VLM annotation β fine-tuning dataset
```
π Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)
## License
Apache 2.0 β see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE)
> Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.
|