| --- |
| title: LTX 2.3 Studio |
| emoji: π¬ |
| colorFrom: purple |
| colorTo: blue |
| sdk: gradio |
| sdk_version: "5.50.0" |
| app_file: app.py |
| python_version: "3.11" |
| suggested_hardware: zero-a10g |
| hf_oauth: false |
| preload_from_hub: |
| - Comfy-Org/ltx-2 split_files/text_encoders/gemma_3_12B_it.safetensors |
| - Kijai/LTX2.3_comfy diffusion_models/ltx-2.3-22b-dev_transformer_only_bf16.safetensors,loras/ltx-2.3-22b-distilled-lora-dynamic_fro09_avg_rank_105_bf16.safetensors,text_encoders/ltx-2.3_text_projection_bf16.safetensors,vae/LTX23_audio_vae_bf16.safetensors,vae/LTX23_video_vae_bf16.safetensors,vae/taeltx2_3.safetensors |
| - Lightricks/LTX-2-19b-IC-LoRA-Detailer ltx-2-19b-ic-lora-detailer.safetensors |
| - Lightricks/LTX-2-19b-LoRA-Camera-Control-Jib-Down ltx-2-19b-lora-camera-control-jib-down.safetensors |
| - Lightricks/LTX-2-19b-LoRA-Camera-Control-Jib-Up ltx-2-19b-lora-camera-control-jib-up.safetensors |
| - Lightricks/LTX-2-19b-LoRA-Camera-Control-Static ltx-2-19b-lora-camera-control-static.safetensors |
| - Lightricks/LTX-2.3 ltx-2.3-22b-distilled-lora-384.safetensors,ltx-2.3-spatial-upscaler-x2-1.0.safetensors |
| - Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control ltx-2.3-22b-ic-lora-union-control-ref0.5.safetensors |
| - google/gemma-3-12b-it-qat-q4_0-unquantized gemma-3-12b-it/model-00001-of-00005.safetensors,gemma-3-12b-it/model-00002-of-00005.safetensors,gemma-3-12b-it/model-00003-of-00005.safetensors,gemma-3-12b-it/model-00004-of-00005.safetensors,gemma-3-12b-it/model-00005-of-00005.safetensors,gemma-3-12b-it/model.safetensors.index.json,gemma-3-12b-it/preprocessor_config.json,gemma-3-12b-it/tokenizer.model |
| --- |
| |
| # LTX 2.3 Studio |
|
|
| A single-process Gradio app that wraps [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) β Lightricks' open 22B video generation model β under one focused UI. Six modes (text Β· image Β· audio Β· lipsync Β· keyframe Β· style) sharing the same ComfyUI All-In-One workflow. Runs locally on Apple Silicon (MPS) or NVIDIA (CUDA), deploys to Hugging Face Spaces (ZeroGPU). |
|
|
| [](https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio) |
| [](https://github.com/techfreakworm/ltx2.3-AIO-generator/stargazers) |
| [](LICENSE) |
| [](pyproject.toml) |
| [](https://github.com/comfyanonymous/ComfyUI) |
| [](https://huggingface.co/Lightricks/LTX-2.3) |
|
|
| β **Live demo:** https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio |
|
|
| --- |
|
|
| ## What's inside |
|
|
| Six modes wired through the same ComfyUI All-In-One workflow. Each mode exposes only the inputs it actually consumes β the form stays short and focused. |
|
|
| | Mode | Inputs | Output | Notes | |
| |---|---|---|---| |
| | **Text β Video** | Prompt (+ optional audio prompt) | mp4 (+ optional wav) | The core mode. Camera-control LoRAs auto-applied by keyword. | |
| | **Audio β Video** | Prompt + audio track | mp4 with the input audio preserved | Conditions motion on the audio waveform. | |
| | **Image β Video** | Image + prompt | mp4 (+ optional audio) | Image-conditioned generation. | |
| | **Lipsync** | Image + audio | mp4 with audio | Viseme-aligned mouth motion. | |
| | **Keyframe** | First + last frames + prompt | mp4 | Latent interpolation between two anchors. | |
| | **Style Transfer** | Source video + style image | mp4 | IC-LoRA restyle; motion preserved from source. | |
|
|
| Every mode carries **Fast / Balanced / Quality** presets (steps Γ 1, Γ 1.5, Γ 3). A per-mode ZeroGPU duration estimator adapts the call timeout to the requested workload. |
|
|
| --- |
|
|
| ## Quick start (local) |
|
|
| Requires **Python 3.11**, ~80 GB free disk for the weight set, and ~24 GB VRAM (CUDA) or ~32 GB unified memory (Apple Silicon). |
|
|
| ```bash |
| git clone --recurse-submodules https://github.com/techfreakworm/ltx2.3-AIO-generator |
| cd ltx2.3-AIO-generator |
| bash setup.sh # creates .venv, installs ComfyUI + pinned custom nodes + app deps |
| source .venv/bin/activate |
| python app.py # http://127.0.0.1:7860 |
| ``` |
|
|
| The first run resolves model weights into your HF cache (`~/.cache/huggingface/hub/`) and symlinks them into `comfyui/models/<comfy_type>/`. Subsequent starts skip the download. Expect ~70 GB of weights pulled on a cold first run. |
|
|
| **Apple Silicon notes.** `PYTORCH_ENABLE_MPS_FALLBACK=1` is set automatically so the few MPS-unsupported ops fall back to CPU. ComfyUI's VRAM autodetect picks the right tier; override with `LTX23_AIO_VRAM=lowvram|normalvram|highvram` if you need to force one. |
|
|
| **LAN access** (phone / tablet on the same WiFi): `python app.py` binds `0.0.0.0:7860`. Visit `http://<your-LAN-IP>:7860` from another device. On macOS, allow inbound for `python` in System Settings β Network β Firewall if the connection refuses. |
|
|
| ## Quick start (HF Spaces) |
|
|
| This repo is a Gradio Space. The Pro tier provides ZeroGPU (A10G) access and the per-call duration budget needed for the Balanced and Quality presets. |
|
|
| ```bash |
| git remote add space https://huggingface.co/spaces/<your-handle>/LTX2.3-Studio |
| git push space master:main # local branch is master; HF Space deploys from main |
| ``` |
|
|
| > β The refspec `master:main` matters. The local default branch is `master` (GitHub convention); the HF Space deploys from `main`. A bare `git push space master` creates an orphan remote branch that does NOT trigger a deploy. |
|
|
| The Space's `preload_from_hub` directive (see the YAML at the top of this file) bakes ~111 GB of weights into the build image. `app.py:_bootstrap()` then: |
|
|
| 1. Clones ComfyUI + pinned custom nodes into `~/comfyui` on cold start (ZeroGPU container freezes preserve them across calls) |
| 2. Mirrors the read-only preload cache into `~/hf-cache-rw/` β works around the build-user-vs-runtime-user permissions trap (preloaded files are root-owned; we run as uid 1000 and can't write to them, so any lazy download to the cache would fail with `Permission denied`) |
| 3. Stages seed input files into `comfyui/input/` so workflow loaders don't error before any user upload arrives |
|
|
| Subsequent requests hit warm cache β no network traffic on inference 2+. |
|
|
| **ZeroGPU duration estimator.** Each generate call carries a dynamic `@spaces.GPU(duration=N)` calculated from mode, preset, and frame count. Clamped at `[60, 900] s`. On timeout (`"GPU task aborted"`), the handler auto-retries once at 2Γ duration. |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| ββββββββββββββββββββββββββββββββββββ |
| browser βββΆβ app.py β Gradio Blocks β |
| β header Β· drawer Β· 6 mode tabs β |
| ββββββββββββββββββββ¬ββββββββββββββββ |
| β |
| βΌ |
| ββββββββββββββββββββββββββββββββββββ |
| β backend.py β |
| β ComfyUILibraryBackend β |
| β @spaces.GPU(duration=callable) β |
| β calls PromptExecutor directly β |
| ββββββββββββββββββββ¬ββββββββββββββββ |
| β |
| ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββ΄βββββββ¬βββββββββββββββββββ |
| βΌ βΌ βΌ βΌ βΌ |
| modes.py models.py workflow.py ui.py tools/ |
| per-mode walk + ensure load + patch per-mode form extract_modes.py |
| parameterize from HF cache API-format JSON builders (regen workflows/) |
| β |
| βΌ |
| ββββββββββββββββββββββββββββββββββββ |
| β comfyui/ β |
| β submodule (local) β |
| β runtime clone at ~/comfyui β |
| β on HF Spaces β |
| β β |
| β βββ custom_nodes/ (pinned SHAs)β |
| β βββ models/ β HF cache symlinksβ |
| ββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| **One backend, one process.** The `@spaces.GPU` decorator is the only divergence between local and Spaces runtime. ComfyUI manages VRAM via its tiered presets β no `empty_cache()` sprinkling needed elsewhere. |
|
|
| **Workflow as data.** Each of the six modes is a user-exported API-format JSON in `workflows/`. The mode handler patches a deep-copied template (`modes.parameterize_fn`) and hands it to ComfyUI's `PromptExecutor`. Updating the master workflow is a three-step ritual: edit in the ComfyUI editor β export β `python tools/extract_modes.py --master ... --out workflows`. |
|
|
| --- |
|
|
| ## Project layout |
|
|
| ``` |
| . |
| βββ app.py # Gradio Blocks entry, _bootstrap, _on_generate, mode tabs |
| βββ backend.py # ComfyUILibraryBackend, @spaces.GPU, duration estimator |
| βββ modes.py # MODE_REGISTRY + per-mode parameterize_fn + node-id constants |
| βββ models.py # MODEL_REGISTRY, walk_workflow_for_models, ensure_models |
| βββ ui.py # render_status, _render_idle, mode-form layout primitives |
| βββ workflow.py # load_template, set_input helpers |
| βββ workflows/ # API-format mode JSONs (do not hand-edit) |
| β βββ t2v.json |
| β βββ i2v.json |
| β βββ a2v.json |
| β βββ lipsync.json |
| β βββ keyframe.json |
| β βββ style.json |
| βββ assets/seed_inputs/ # placeholder image / audio / video for cold-start staging |
| βββ tools/ |
| β βββ extract_modes.py # regenerate workflows/ from a master ComfyUI export |
| βββ docs/ |
| β βββ future_improvements.md |
| β βββ superpowers/{specs,plans}/ # spec + implementation plans per feature |
| βββ tests/ # L1 + L3 in CI; L2 with --comfy-real; L4 GPU smoke |
| βββ README.md # this file (HF Space YAML + project intro) |
| βββ CLAUDE.md # project facts + gotchas (what & why) |
| βββ AGENTS.md # tool-agnostic agent rulebook |
| βββ SKILLS.md # process / debugging / deployment (how) |
| βββ requirements.txt # pinned deps |
| βββ pyproject.toml # ruff + pytest config (py311) |
| βββ setup.sh # venv + ComfyUI + custom nodes bootstrap |
| βββ comfyui/ # git submodule (local) / runtime clone target (Spaces) |
| ``` |
|
|
| --- |
|
|
| ## Tech stack |
|
|
| - **[Gradio 5.50](https://gradio.app/)** β UI shell, native components, `gr.Progress(track_tqdm=True)` |
| - **[ComfyUI](https://github.com/comfyanonymous/ComfyUI)** β library-mode `PromptExecutor` (pinned commit; submodule locally, runtime-cloned on Spaces) |
| - **[LTX-2.3 22B](https://huggingface.co/Lightricks/LTX-2.3)** by Lightricks β primary diffusion transformer (BF16 weights via [Kijai/LTX2.3_comfy](https://huggingface.co/Kijai/LTX2.3_comfy)) |
| - **[Gemma 3 12B](https://huggingface.co/google/gemma-3-12b-it)** by Google β multimodal text encoder (requires the full 5-shard model β text-only checkpoints crash on meta-tensor allocation in SDPA) |
| - **Custom nodes** (pinned SHAs in `app.CUSTOM_NODES_PINNED`): |
| - [Lightricks/ComfyUI-LTXVideo](https://github.com/Lightricks/ComfyUI-LTXVideo) β LTX sampler / decoder nodes |
| - [kijai/ComfyUI-KJNodes](https://github.com/kijai/ComfyUI-KJNodes) β utility nodes |
| - [rgthree/rgthree-comfy](https://github.com/rgthree/rgthree-comfy) β Power-Lora-Loader |
| - [Kosinkadink/ComfyUI-VideoHelperSuite](https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite) β video I/O |
| - [pythongosssss/ComfyUI-Custom-Scripts](https://github.com/pythongosssss/ComfyUI-Custom-Scripts) β string / dict helpers |
| - [city96/ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) β GGUF transformer loader |
| - [Fannovel16/comfyui_controlnet_aux](https://github.com/Fannovel16/comfyui_controlnet_aux) β DWPose for Lipsync/Style preprocessors |
| - [evanspearman/ComfyMath](https://github.com/evanspearman/ComfyMath) β math nodes for the workflow's keyframe path |
| - [Smirnov75/ComfyUI-mxToolkit](https://github.com/Smirnov75/ComfyUI-mxToolkit) β utility nodes |
| - [DoctorDiffusion/ComfyUI-MediaMixer](https://github.com/DoctorDiffusion/ComfyUI-MediaMixer) β `FinalFrameSelector` |
| - **[HF Spaces ZeroGPU](https://huggingface.co/zero-gpu)** (A10G) β `@spaces.GPU(duration=β¦)` for queue-priority signalling and per-call timeout |
|
|
| --- |
|
|
| ## Design |
|
|
| Theme: **Topaz Cinema Slate** β slate substrate `#1A1F26`, warm amber accent `#E0A458` used sparingly, IBM Plex Sans throughout. Defined as `_TOPAZ_THEME` + `_CUSTOM_CSS` in `app.py`. |
|
|
| Layout: hamburger drawer. Pinned 220 px sidebar at β₯1024 px (mode buttons + model status + settings); below 1024 px it slides in as a fixed overlay via the `.aio-shell.drawer-open` class. The header carries a live mode tag (T2V/A2V/I2V/LIPSYNC/KEY/STYLE) updated by JS without a server round-trip. |
|
|
| Spec, plan, and design rationale live under `docs/superpowers/specs/` and `docs/superpowers/plans/`. |
|
|
| --- |
|
|
| ## Notes on running |
|
|
| - **First inference is slow.** Cold-start workflow validation + model load on the active node graph takes ~30 β 90 s. Subsequent calls within the same session reuse loaded models. |
| - **VRAM tier** is auto-detected; override with `LTX23_AIO_VRAM=lowvram|normalvram|highvram`. |
| - **ZeroGPU duration cap.** The per-call estimator clamps to `[60, 900] s`. If a generation aborts with `"GPU task aborted"`, the handler retries once at 2Γ duration. The duration field is the queue-priority signal, not a billing cap. |
| - **Output directory.** Local: `comfyui/output/LTX2.3/`. Spaces: `~/comfyui/output/LTX2.3/`. Both are whitelisted via `allowed_paths=` on launch (Gradio 5 file-access policy). |
| - **Local LAN testing.** Bound to `0.0.0.0:7860`. macOS firewall: allow inbound for `python` if a connection from your phone refuses. |
|
|
| --- |
|
|
| ## License |
|
|
| MIT for the AIO app code (see `LICENSE`). |
|
|
| - [ComfyUI](https://github.com/comfyanonymous/ComfyUI) is GPL-3.0. |
| - LTX-2.3 and Lightricks-published LoRAs / auxiliaries retain Lightricks' open-source licensing β see the individual model cards on Hugging Face. |
| - Gemma 3 weights are subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). |
| - Each pinned custom node retains its own license; see the linked repositories. |
|
|
| ## Credits |
|
|
| - **LTX-2.3** by [Lightricks](https://github.com/Lightricks) |
| - **ComfyUI** by [comfyanonymous](https://github.com/comfyanonymous) |
| - **Gemma 3** by [Google DeepMind](https://github.com/google-deepmind) |
| - **All-In-One ComfyUI workflow** that this app wraps β by [Danielle Falco](https://www.youtube.com/@FutuTek) (FutuTek) |
| - **Workflow nodes** by Lightricks, [kijai](https://github.com/kijai), [rgthree](https://github.com/rgthree), [Kosinkadink](https://github.com/Kosinkadink), [pythongosssss](https://github.com/pythongosssss), [city96](https://github.com/city96), [Fannovel16](https://github.com/Fannovel16), [evanspearman](https://github.com/evanspearman), [Smirnov75](https://github.com/Smirnov75), [DoctorDiffusion](https://github.com/DoctorDiffusion) |
|
|
| Built by [@techfreakworm](https://huggingface.co/techfreakworm) β drop a β₯ on the [Space](https://huggingface.co/spaces/techfreakworm/LTX2.3-Studio) if it's useful, and follow there for what's next. |
|
|