NVisionAI
/

cosmos-transfer

Model card Files Files and versions

cosmos-transfer / README.md

NVisionAI's picture

Add model card

6fc95d9 verified about 1 month ago

|

history blame contribute delete

2.93 kB

	---
	license: apache-2.0
	tags:
	- video-to-video
	- sim2real
	- synthetic-data
	- surveillance
	- cosmos
	- nvidia
	- docker
	- rest-api
	- diffusion
	pipeline_tag: text-to-video
	---

	# cosmos-transfer 🎬

	A REST API wrapper around [NVIDIA Cosmos-Transfer2.5](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) — a 2B parameter video diffusion model that converts synthetic renders into photorealistic video (Sim2Real).

	Packaged as a ready-to-run Docker microservice with battle-tested parameters tuned across 80+ surveillance clips.

	## Quick Start

	```bash
	docker pull ghcr.io/eyalenav/cosmos-transfer:latest

	docker run --rm --gpus '"device=0"' -p 8080:8080 \
	-v ~/.cache/huggingface:/root/.cache/huggingface \
	-e HUGGINGFACE_TOKEN=hf_... \
	ghcr.io/eyalenav/cosmos-transfer:latest
	```

	> ⚠️ First run downloads Cosmos-Transfer2.5-2B weights (~20GB). Requires a HuggingFace token with access to `nvidia/Cosmos-Transfer2.5-2B`.

	## API

	### `POST /transfer`

	Convert a synthetic video to photorealistic.

	```bash
	curl -X POST http://localhost:8080/transfer \
	-F "video=@synthetic_render.mp4" \
	-F "prompt=surveillance camera footage of a crowded street" \
	--output photorealistic.mp4
	```

	Parameters:

	\| Field \| Default \| Description \|
	\|---\|---\|---\|
	\| `video` \| required \| Input synthetic MP4 \|
	\| `prompt` \| `""` \| Text guidance for the scene \|
	\| `edge_strength` \| `0.85` \| Canny edge control (geometry preservation) \|
	\| `vis_strength` \| `0.45` \| Visual blur control (scene structure) \|
	\| `sigma` \| `100` \| Noise level (realism vs. fidelity) \|

	### `GET /health`

	```bash
	curl http://localhost:8080/health
	# {"status": "ok"}
	```

	## Tuned Parameters

	After 80+ clips, the sweet spot for surveillance synthetic data:

	```
	edge=0.85 + vis=0.45 + sigma=100
	```

	- edge 0.85 — strong geometry/silhouette preservation from Canny
	- vis 0.45 — moderate scene structure preservation
	- sigma 100 — balanced realism without losing the synthetic layout

	## Requirements

	\| Resource \| Minimum \|
	\|---\|---\|
	\| GPU \| A100 / RTX 6000 Ada / H100 \|
	\| VRAM \| 40 GB \|
	\| RAM \| 64 GB \|
	\| Disk \| 30 GB (model weights) \|

	## Part of VisionAI-Flywheel

	This service is one component of a full synthetic surveillance data pipeline:

	```
	[kimodo-api] → NPZ motion
	↓
	[render-api] → SOMA mesh render (MP4)
	↓
	[cosmos-transfer] → Sim2Real photorealistic video ← this image
	↓
	[NVIDIA VSS] → VLM annotation → fine-tuning dataset
	```

	🔗 Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel)

	## License

	Apache 2.0 — see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE)

	> Cosmos-Transfer2.5 model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.