--- license: apache-2.0 tags: - motion-generation - text-to-motion - human-motion - surveillance - synthetic-data - docker - rest-api - kimodo - nvidia pipeline_tag: text-to-video --- # kimodo-api 🏃 A **REST API wrapper** around [NVIDIA Kimodo](https://github.com/nv-tlabs/kimodo) — the state-of-the-art text-to-motion diffusion model trained on 700 hours of commercial mocap data. This image turns Kimodo into a microservice you can call from any pipeline, no Python environment needed. ## Quick Start ```bash docker pull ghcr.io/eyalenav/kimodo-api:latest docker run --rm --gpus '"device=0"' -p 9551:9551 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -e HUGGINGFACE_TOKEN=hf_... \ ghcr.io/eyalenav/kimodo-api:latest ``` > ⚠️ First run downloads Llama-3-8B-Instruct (~16GB) for the text encoder. Requires a HuggingFace token with access to `meta-llama/Meta-Llama-3-8B-Instruct`. ## API ### `POST /generate` Generate a motion clip from a text prompt. ```bash curl -X POST http://localhost:9551/generate \ -H "Content-Type: application/json" \ -d '{"prompt": "person pushing through a crowd aggressively"}' ``` **Response:** NPZ file (binary) — SOMA 77-joint skeleton format, compatible with BVH export. ### `GET /health` ```bash curl http://localhost:9551/health # {"status": "ok"} ``` ## Requirements | Resource | Minimum | |---|---| | GPU | RTX 3090 / A100 / RTX 6000 Ada | | VRAM | 24 GB | | RAM | 32 GB | | Disk | 50 GB (model weights) | ## What's inside - **Kimodo** — NVIDIA's kinematic motion diffusion model (77-joint SOMA skeleton) - **LLM2Vec** text encoder backed by **Llama-3-8B-Instruct** - **FastAPI** server on port 9551 - Health check + graceful startup ## Part of VisionAI-Flywheel This service is one component of a full synthetic surveillance data pipeline: ``` [kimodo-api] → NPZ motion ↓ [render-api] → SOMA mesh render (MP4) ↓ [cosmos-transfer] → Sim2Real photorealistic video ↓ [NVIDIA VSS] → VLM annotation → fine-tuning dataset ``` 🔗 Full pipeline: [github.com/EyalEnav/VisionAI-Flywheel](https://github.com/EyalEnav/VisionAI-Flywheel) ## License Apache 2.0 — see [LICENSE](https://github.com/EyalEnav/VisionAI-Flywheel/blob/main/LICENSE) > Kimodo model weights are released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and downloaded at runtime. They are not bundled in this image.