File size: 2,928 Bytes
6fc95d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: apache-2.0
tags:
- video-to-video
- sim2real
- synthetic-data
- surveillance
- cosmos
- nvidia
- docker
- rest-api
- diffusion
pipeline_tag: text-to-video
---

# cosmos-transfer 🎬

A **REST API wrapper** around [NVIDIA Cosmos-Transfer2.5](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) β€” a 2B parameter video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).

Packaged as a ready-to-run Docker microservice with battle-tested parameters tuned across 80+ surveillance clips.

## Quick Start

```bash
docker pull ghcr.io/eyalenav/cosmos-transfer:latest

docker run --rm --gpus '"device=0"' -p 8080:8080 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e HUGGINGFACE_TOKEN=hf_... \
  ghcr.io/eyalenav/cosmos-transfer:latest
```

> ⚠️ First run downloads Cosmos-Transfer2.5-2B weights (~20GB). Requires a HuggingFace token with access to `nvidia/Cosmos-Transfer2.5-2B`.

## API

### `POST /transfer`

Convert a synthetic video to photorealistic.

```bash
curl -X POST http://localhost:8080/transfer \
  -F "video=@synthetic_render.mp4" \
  -F "prompt=surveillance camera footage of a crowded street" \
  --output photorealistic.mp4
```

**Parameters:**

| Field | Default | Description |
|---|---|---|
| `video` | required | Input synthetic MP4 |
| `prompt` | `""` | Text guidance for the scene |
| `edge_strength` | `0.85` | Canny edge control (geometry preservation) |
| `vis_strength` | `0.45` | Visual blur control (scene structure) |
| `sigma` | `100` | Noise level (realism vs. fidelity) |

### `GET /health`

```bash
curl http://localhost:8080/health
# {"status": "ok"}
```

## Tuned Parameters

After 80+ clips, the sweet spot for **surveillance synthetic data**:

```
edge=0.85 + vis=0.45 + sigma=100
```

- **edge 0.85** β€” strong geometry/silhouette preservation from Canny
- **vis 0.45** β€” moderate scene structure preservation  
- **sigma 100** β€” balanced realism without losing the synthetic layout

## Requirements

| Resource | Minimum |
|---|---|
| GPU | A100 / RTX 6000 Ada / H100 |
| VRAM | 40 GB |
| RAM | 64 GB |
| Disk | 30 GB (model weights) |

## Part of VisionAI-Flywheel

This service is one component of a full synthetic surveillance data pipeline:

```
[kimodo-api] β†’ NPZ motion
    ↓
[render-api] β†’ SOMA mesh render (MP4)
    ↓
[cosmos-transfer] β†’ Sim2Real photorealistic video  ← this image
    ↓
[NVIDIA VSS] β†’ VLM annotation β†’ fine-tuning dataset
```

πŸ”— Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)

## License

Apache 2.0 β€” see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE)

> Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.