Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.14.0
Future improvements
A backlog of optimizations that aren't blocking but would tighten the deploy. None of these are required for current functionality. Order is rough priority, not commitment.
Spaces / preload
0. Re-enable preload_from_hub via runtime cache mirror — DONE 2026-05-02
preload_from_hub via runtime cache mirrorInitial preload deployment failed because HF's build pipeline writes
~/.cache/huggingface/ as the build user, leaving it read-only for runtime
user 1000. Lazy hf_hub_download for non-preloaded files (GGUF, camera LoRAs)
failed with Permission denied (os error 13). chmod couldn't help — we
don't own the inode.
Fix landed in _bootstrap()'s _mirror_preload_hf_cache():
- Walks
~/.cache/huggingface/to a parallel~/hf-cache-rw/we own - Hardlinks
blobs/<sha>files (zero-copy, shared inode, instant reads) - Preserves relative snapshot symlinks (resolve within the mirror tree)
- Byte-copies
refs/<branch>files (HF lib overwrites these on etag check) - Sets
HF_HOME+HF_HUB_CACHEto the mirror so HF lib uses our writable copy - Falls back to symlink if
os.link()returns EXDEV (cross-device)
Result: preloaded files are instantly available (cache hit on first generate), non-preloaded files lazy-download into dirs we own (no permission errors).
1. Stop preloading models that aren't referenced by any workflow — DONE 2026-05-02
Audit on 2026-05-02 showed two Lightricks/LTX-2.3 files in preload_from_hub
that aren't actually referenced by any workflow JSON we ship:
ltx-2.3-22b-dev.safetensors(~42 GB)ltx-2.3-22b-distilled.safetensors(~42 GB)
The active path uses Kijai/LTX2.3_comfy ltx-2.3-22b-dev_transformer_only_bf16.safetensors.
Removed both — ~84 GB saved. Forced by HF eviction with storage limit exceeded (150G) when total preload was ~234 GB. Risk: if a future workflow
update reintroduces the Lightricks-side filenames, lazy download takes over.
2. Drop unsloth/LTX-2.3-GGUF from preload (~39 GB) — DONE 2026-05-02
unsloth/LTX-2.3-GGUF from preload (~39 GB)Removed alongside (1). GGUF transformer is the low-VRAM alternative; ZeroGPU H200 has 70 GB so the BF16 transformer always fits. Lazy-loads on first use of any preset that wires the GGUF path.
3. Drop the Lightricks/LTX-2-19b-LoRA-Camera-Control-Static/Jib-Up/Jib-Down preload
Each is ~2 GB. The Power Lora Loader has them all listed but defaults all to
on: false, so they only load when the user picks one. Lazy-load is
appropriate. Currently kept in preload because of the 10-entry cap +
"easier to keep what we had".
4. Auto-generate preload_from_hub from MODEL_REGISTRY
Today the README list and MODEL_REGISTRY in models.py can drift. Build a
small tools/sync_preload.py that:
- Reads
MODEL_REGISTRY - Walks the workflow JSONs to find which entries are actually referenced
- Sorts referenced entries by size (using
huggingface_hubrepo_info) - Picks the top N entries that fit in the 10-cap
- Writes them back into the README YAML
Run as a pre-commit or CI step.
5. Bake custom-node clones into the build via requirements.txt git installs
We currently git clone 10 custom-node repos in _bootstrap() at runtime.
That's ~30 s of cold start. Some custom nodes ship as pip-installable; for
the others, we could write a small tools/install_custom_nodes.py that
runs at build time (via pip install --no-deps against git URLs) so the
repos land in the image instead of being fetched at boot.
Tradeoff: Spaces' build pipeline runs the gradio SDK Dockerfile which we
don't control directly. The custom-node clone has to happen at runtime
unless we can move it into the standard requirements.txt build step.
6. Persistent storage add-on as the "$25/mo button"
If iteration speed becomes the binding constraint, the persistent storage
add-on (Spaces > Settings) at $25/mo for 150 GB makes everything just work
— /data is writable, models live there forever, no preload dance.
Sketched approach: HF_HOME=/data/hf-cache env var + _bootstrap() mkdir
fallback. One-line code change.
Workflow / runtime
7. Move ComfyUI custom-node requirements.txt install to build time
Bootstrap currently pip installs each custom node's requirements at
runtime. Most are no-ops (deps already in our top-level requirements.txt)
but the pip install --quiet calls still take a few seconds each. Could
audit and just merge them into the top-level requirements.txt.
8. Clean up nodes_replacements.py warning
ComfyUI core at our pinned commit (eb0686bb) emits
'function' object has no attribute 'register' because the node-replacement
API surface is incomplete at that SHA. Bumping COMFYUI_COMMIT to a newer
tag should silence it. Pure cosmetic — no functional impact.
9. Auto-close drawer when user navigates away from header
Currently relies on document-level click listener. Works but has a
microsecond race when the click target is between elements. Could use
pointerleave on the drawer instead.
Cost-of-running
10. Trim ZeroGPU duration cap
Currently @spaces.GPU(duration=300) reserves 5 min per call. For Fast preset
(distilled 8 steps) actual usage is ~30 s. Could shorten to 120 s — improves
queue priority for the user (per HF docs). Use dynamic duration based on
preset.
11. Local-perf "low-VRAM" path for style mode (GGUF Q4 transformer)
Style mode on Apple Silicon runs 37× slower per sampling step than the other
modes (596 s/step on Mac vs ~16 s/step for lipsync). Root cause is
architectural — LTXAddVideoICLoRAGuide concatenates the source video's
DWPose latents into the noisy target latent, doubling the attention sequence
to ~56 k tokens. Combined with MPS having no flash-attn-2 and the 22B BF16
model approaching the working-memory ceiling, perf collapses on Mac.
H200 handles this fine (flash-attn-3 + tensor cores + dedicated VRAM ⇒ ~30–60 s end to end on Spaces). So this is fundamentally a Mac/MPS gap, not a code bug.
A "Low VRAM" preset that swaps the BF16 transformer for the GGUF Q4
quantized one would reduce per-step memory pressure and may bring local
style perf into the workable range (still slow, but maybe ~60–90 s/step
instead of 600). The GGUF file is already declared in MODEL_REGISTRY
(UnetLoaderGGUF consumer). What's missing:
- A workflow toggle that swaps
UNETLoader→UnetLoaderGGUFfor the main transformer in style.json (and other modes that benefit). - A UI control on the Advanced accordion: "Low VRAM (GGUF Q4)".
- Wire-through in
_style_parameterize(and friends) to flip the loader class. - Delete the matching BF16 path nodes when GGUF is selected (or set them to bypass) so we don't load both.
Risk: GGUF transformers behave slightly differently from BF16 — output quality drops, especially for IC-LoRA paths where the dynamic range matters. Should be opt-in only, never default. Probably v1.1+ scope (it's listed in "Out of scope for v1" in CLAUDE.md as the GGUF Q4 / Low VRAM preset).