File size: 2,928 Bytes

6fc95d9

---
license: apache-2.0
tags:
- video-to-video
- sim2real
- synthetic-data
- surveillance
- cosmos
- nvidia
- docker
- rest-api
- diffusion
pipeline_tag: text-to-video
---

# cosmos-transfer 🎬

A **REST API wrapper** around [NVIDIA Cosmos-Transfer2.5](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) — a 2B parameter video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).

Packaged as a ready-to-run Docker microservice with battle-tested parameters tuned across 80+ surveillance clips.

## Quick Start

```bash
docker pull ghcr.io/eyalenav/cosmos-transfer:latest

docker run --rm --gpus '"device=0"' -p 8080:8080 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e HUGGINGFACE_TOKEN=hf_... \
  ghcr.io/eyalenav/cosmos-transfer:latest
```

> ⚠️ First run downloads Cosmos-Transfer2.5-2B weights (~20GB). Requires a HuggingFace token with access to `nvidia/Cosmos-Transfer2.5-2B`.

## API

### `POST /transfer`

Convert a synthetic video to photorealistic.

```bash
curl -X POST http://localhost:8080/transfer \
  -F "video=@synthetic_render.mp4" \
  -F "prompt=surveillance camera footage of a crowded street" \
  --output photorealistic.mp4
```

**Parameters:**

| Field | Default | Description |
|---|---|---|
| `video` | required | Input synthetic MP4 |
| `prompt` | `""` | Text guidance for the scene |
| `edge_strength` | `0.85` | Canny edge control (geometry preservation) |
| `vis_strength` | `0.45` | Visual blur control (scene structure) |
| `sigma` | `100` | Noise level (realism vs. fidelity) |

### `GET /health`

```bash
curl http://localhost:8080/health
# {"status": "ok"}
```

## Tuned Parameters

After 80+ clips, the sweet spot for **surveillance synthetic data**:

```
edge=0.85 + vis=0.45 + sigma=100
```

- **edge 0.85** — strong geometry/silhouette preservation from Canny
- **vis 0.45** — moderate scene structure preservation  
- **sigma 100** — balanced realism without losing the synthetic layout

## Requirements

| Resource | Minimum |
|---|---|
| GPU | A100 / RTX 6000 Ada / H100 |
| VRAM | 40 GB |
| RAM | 64 GB |
| Disk | 30 GB (model weights) |

## Part of VisionAI-Flywheel

This service is one component of a full synthetic surveillance data pipeline:

```
[kimodo-api] → NPZ motion
    ↓
[render-api] → SOMA mesh render (MP4)
    ↓
[cosmos-transfer] → Sim2Real photorealistic video  ← this image
    ↓
[NVIDIA VSS] → VLM annotation → fine-tuning dataset
```

🔗 Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)

## License

Apache 2.0 — see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE)

> Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.