cosmos-transfer / README.md
NVisionAI's picture
Add model card
6fc95d9 verified
---
license: apache-2.0
tags:
- video-to-video
- sim2real
- synthetic-data
- surveillance
- cosmos
- nvidia
- docker
- rest-api
- diffusion
pipeline_tag: text-to-video
---
# cosmos-transfer 🎬
A **REST API wrapper** around [NVIDIA Cosmos-Transfer2.5](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) β€” a 2B parameter video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).
Packaged as a ready-to-run Docker microservice with battle-tested parameters tuned across 80+ surveillance clips.
## Quick Start
```bash
docker pull ghcr.io/eyalenav/cosmos-transfer:latest
docker run --rm --gpus '"device=0"' -p 8080:8080 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e HUGGINGFACE_TOKEN=hf_... \
ghcr.io/eyalenav/cosmos-transfer:latest
```
> ⚠️ First run downloads Cosmos-Transfer2.5-2B weights (~20GB). Requires a HuggingFace token with access to `nvidia/Cosmos-Transfer2.5-2B`.
## API
### `POST /transfer`
Convert a synthetic video to photorealistic.
```bash
curl -X POST http://localhost:8080/transfer \
-F "video=@synthetic_render.mp4" \
-F "prompt=surveillance camera footage of a crowded street" \
--output photorealistic.mp4
```
**Parameters:**
| Field | Default | Description |
|---|---|---|
| `video` | required | Input synthetic MP4 |
| `prompt` | `""` | Text guidance for the scene |
| `edge_strength` | `0.85` | Canny edge control (geometry preservation) |
| `vis_strength` | `0.45` | Visual blur control (scene structure) |
| `sigma` | `100` | Noise level (realism vs. fidelity) |
### `GET /health`
```bash
curl http://localhost:8080/health
# {"status": "ok"}
```
## Tuned Parameters
After 80+ clips, the sweet spot for **surveillance synthetic data**:
```
edge=0.85 + vis=0.45 + sigma=100
```
- **edge 0.85** β€” strong geometry/silhouette preservation from Canny
- **vis 0.45** β€” moderate scene structure preservation
- **sigma 100** β€” balanced realism without losing the synthetic layout
## Requirements
| Resource | Minimum |
|---|---|
| GPU | A100 / RTX 6000 Ada / H100 |
| VRAM | 40 GB |
| RAM | 64 GB |
| Disk | 30 GB (model weights) |
## Part of VisionAI-Flywheel
This service is one component of a full synthetic surveillance data pipeline:
```
[kimodo-api] β†’ NPZ motion
↓
[render-api] β†’ SOMA mesh render (MP4)
↓
[cosmos-transfer] β†’ Sim2Real photorealistic video ← this image
↓
[NVIDIA VSS] β†’ VLM annotation β†’ fine-tuning dataset
```
πŸ”— Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)
## License
Apache 2.0 β€” see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE)
> Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.