NVisionAI commited on
Commit
65d140c
Β·
verified Β·
1 Parent(s): 6fc95d9

Add standalone API documentation

Browse files
Files changed (1) hide show
  1. API.md +240 -0
API.md ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # cosmos-transfer
2
+
3
+ REST API microservice wrapper around [NVIDIA Cosmos-Transfer2.5-2B](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) β€” a video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).
4
+
5
+ ---
6
+
7
+ ## Installation
8
+
9
+ ```bash
10
+ docker pull ghcr.io/eyalenav/cosmos-transfer:latest
11
+ ```
12
+
13
+ ### Run
14
+
15
+ ```bash
16
+ docker run --rm \
17
+ --gpus '"device=0"' \
18
+ -p 8080:8080 \
19
+ -v ~/.cache/huggingface:/root/.cache/huggingface \
20
+ -e HUGGINGFACE_TOKEN=hf_... \
21
+ ghcr.io/eyalenav/cosmos-transfer:latest
22
+ ```
23
+
24
+ > **First run:** downloads Cosmos-Transfer2.5-2B weights (~20 GB). Subsequent starts are fast.
25
+
26
+ ---
27
+
28
+ ## API Reference
29
+
30
+ ### `GET /health`
31
+
32
+ Check server status.
33
+
34
+ **Request**
35
+ ```
36
+ GET http://localhost:8080/health
37
+ ```
38
+
39
+ **Response**
40
+ ```json
41
+ {
42
+ "status": "ok",
43
+ "model": "Cosmos-Transfer2.5-2B",
44
+ "device": "cuda:0"
45
+ }
46
+ ```
47
+
48
+ ---
49
+
50
+ ### `POST /transfer`
51
+
52
+ Convert a synthetic video to photorealistic using multicontrol (edge + visual).
53
+
54
+ **Request**
55
+ ```
56
+ POST http://localhost:8080/transfer
57
+ Content-Type: multipart/form-data
58
+ ```
59
+
60
+ | Field | Type | Default | Description |
61
+ |---|---|---|---|
62
+ | `video` | file | required | Input synthetic MP4 (max 10s @ 24fps recommended) |
63
+ | `prompt` | string | `""` | Text describing the scene (improves realism) |
64
+ | `edge_strength` | float | `0.85` | Canny edge control strength (geometry preservation) |
65
+ | `vis_strength` | float | `0.45` | Visual/blur control strength (scene structure) |
66
+ | `sigma` | int | `100` | Noise level β€” lower = more faithful, higher = more realistic |
67
+ | `num_steps` | int | `35` | Diffusion steps (more = slower but higher quality) |
68
+ | `seed` | int | `-1` | Random seed (`-1` = random) |
69
+
70
+ **Response**
71
+
72
+ Binary MP4 file (`video/mp4`).
73
+
74
+ **Example**
75
+ ```bash
76
+ curl -X POST http://localhost:8080/transfer \
77
+ -F "video=@synthetic_render.mp4" \
78
+ -F "prompt=surveillance camera footage of a crowded urban street, overcast day" \
79
+ -F "edge_strength=0.85" \
80
+ -F "vis_strength=0.45" \
81
+ -F "sigma=100" \
82
+ --output photorealistic.mp4
83
+ ```
84
+
85
+ ---
86
+
87
+ ### `POST /transfer_async`
88
+
89
+ Submit a job and poll for completion (recommended for long clips).
90
+
91
+ **Submit**
92
+ ```bash
93
+ curl -X POST http://localhost:8080/transfer_async \
94
+ -F "video=@render.mp4" \
95
+ -F "prompt=security incident, parking lot" \
96
+ -F "edge_strength=0.85" \
97
+ --output job.json
98
+ # {"job_id": "abc123", "status": "queued"}
99
+ ```
100
+
101
+ **Poll**
102
+ ```bash
103
+ curl http://localhost:8080/status/abc123
104
+ # {"job_id": "abc123", "status": "running", "progress": 0.42}
105
+ # ...
106
+ # {"job_id": "abc123", "status": "done"}
107
+ ```
108
+
109
+ **Download**
110
+ ```bash
111
+ curl http://localhost:8080/result/abc123 --output photorealistic.mp4
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Tuned Parameters
117
+
118
+ Tested across 80+ surveillance clips β€” confirmed sweet spot:
119
+
120
+ ```
121
+ edge_strength=0.85 + vis_strength=0.45 + sigma=100
122
+ ```
123
+
124
+ | Parameter | Value | Effect |
125
+ |---|---|---|
126
+ | `edge_strength` | **0.85** | Strong silhouette/geometry preservation from Canny edges |
127
+ | `vis_strength` | **0.45** | Moderate scene structure via visual blur control |
128
+ | `sigma` | **100** | Balanced noise β€” realistic textures without losing layout |
129
+
130
+ ### When to adjust
131
+
132
+ | Scenario | Adjustment |
133
+ |---|---|
134
+ | Subject drifts from synthetic pose | Increase `edge_strength` β†’ 0.90–0.95 |
135
+ | Background too synthetic-looking | Increase `vis_strength` β†’ 0.55–0.65 |
136
+ | Output too faithful to render colors | Increase `sigma` β†’ 120 |
137
+ | Too much motion blur | Decrease `sigma` β†’ 80 |
138
+
139
+ ---
140
+
141
+ ## Hardware Requirements
142
+
143
+ | Resource | Minimum | Recommended |
144
+ |---|---|---|
145
+ | GPU | A100 40GB / RTX 6000 Ada | H100 / RTX PRO 6000 Blackwell |
146
+ | VRAM | 40 GB | 48+ GB |
147
+ | RAM | 64 GB | 128 GB |
148
+ | Disk | 30 GB | 50 GB |
149
+ | CUDA | 12.1+ | 12.8 |
150
+
151
+ **Processing time (RTX PRO 6000 Blackwell, 96GB VRAM):**
152
+ - 4s clip @ 24fps β†’ ~3 min
153
+ - 10s clip @ 24fps β†’ ~7 min
154
+
155
+ ---
156
+
157
+ ## Environment Variables
158
+
159
+ | Variable | Required | Description |
160
+ |---|---|---|
161
+ | `HUGGINGFACE_TOKEN` | Yes | HF token with access to `nvidia/Cosmos-Transfer2.5-2B` |
162
+ | `CUDA_VISIBLE_DEVICES` | No | Limit to specific GPU (e.g. `"1"`) |
163
+ | `PORT` | No | Override default port `8080` |
164
+
165
+ ---
166
+
167
+ ## Integration with VisionAI-Flywheel
168
+
169
+ ```yaml
170
+ # docker-compose.yml excerpt
171
+ services:
172
+ cosmos-transfer:
173
+ image: ghcr.io/eyalenav/cosmos-transfer:latest
174
+ ports:
175
+ - "8080:8080"
176
+ deploy:
177
+ resources:
178
+ reservations:
179
+ devices:
180
+ - driver: nvidia
181
+ device_ids: ["1"]
182
+ capabilities: [gpu]
183
+ volumes:
184
+ - hf_cache:/root/.cache/huggingface
185
+ environment:
186
+ - HUGGINGFACE_TOKEN=${HUGGINGFACE_TOKEN}
187
+ ```
188
+
189
+ Full `docker-compose.yml`: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)
190
+
191
+ ---
192
+
193
+ ## Example: Full Python client
194
+
195
+ ```python
196
+ import requests
197
+ import time
198
+
199
+ def transfer_video(
200
+ input_path: str,
201
+ output_path: str,
202
+ prompt: str = "",
203
+ edge_strength: float = 0.85,
204
+ vis_strength: float = 0.45,
205
+ sigma: int = 100
206
+ ):
207
+ """Convert synthetic video to photorealistic."""
208
+ with open(input_path, "rb") as f:
209
+ response = requests.post(
210
+ "http://localhost:8080/transfer",
211
+ files={"video": ("input.mp4", f, "video/mp4")},
212
+ data={
213
+ "prompt": prompt,
214
+ "edge_strength": edge_strength,
215
+ "vis_strength": vis_strength,
216
+ "sigma": sigma,
217
+ },
218
+ timeout=600
219
+ )
220
+ response.raise_for_status()
221
+
222
+ with open(output_path, "wb") as f:
223
+ f.write(response.content)
224
+ print(f"Saved to {output_path}")
225
+
226
+ # Example usage
227
+ transfer_video(
228
+ input_path="soma_render.mp4",
229
+ output_path="photorealistic.mp4",
230
+ prompt="surveillance camera, urban street, daytime, overcast sky"
231
+ )
232
+ ```
233
+
234
+ ---
235
+
236
+ ## License
237
+
238
+ Apache 2.0
239
+
240
+ > Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Weights are downloaded at runtime and are not bundled in this image.