LTX2.3-Studio / docs /future_improvements.md
techfreakworm's picture
feat(spaces): mirror build-time HF cache to runtime-writable tree
14fcab5 unverified
# Future improvements
A backlog of optimizations that aren't blocking but would tighten the deploy.
None of these are required for current functionality. Order is rough priority,
not commitment.
## Spaces / preload
### ~~0. Re-enable `preload_from_hub` via runtime cache mirror~~ — DONE 2026-05-02
Initial preload deployment failed because HF's build pipeline writes
`~/.cache/huggingface/` as the build user, leaving it read-only for runtime
user 1000. Lazy `hf_hub_download` for non-preloaded files (GGUF, camera LoRAs)
failed with `Permission denied (os error 13)`. `chmod` couldn't help — we
don't own the inode.
Fix landed in `_bootstrap()`'s `_mirror_preload_hf_cache()`:
- Walks `~/.cache/huggingface/` to a parallel `~/hf-cache-rw/` we own
- Hardlinks `blobs/<sha>` files (zero-copy, shared inode, instant reads)
- Preserves relative snapshot symlinks (resolve within the mirror tree)
- Byte-copies `refs/<branch>` files (HF lib overwrites these on etag check)
- Sets `HF_HOME` + `HF_HUB_CACHE` to the mirror so HF lib uses our writable copy
- Falls back to symlink if `os.link()` returns EXDEV (cross-device)
Result: preloaded files are instantly available (cache hit on first generate),
non-preloaded files lazy-download into dirs we own (no permission errors).
### ~~1. Stop preloading models that aren't referenced by any workflow~~ — DONE 2026-05-02
Audit on 2026-05-02 showed two `Lightricks/LTX-2.3` files in `preload_from_hub`
that aren't actually referenced by any workflow JSON we ship:
- `ltx-2.3-22b-dev.safetensors` (~42 GB)
- `ltx-2.3-22b-distilled.safetensors` (~42 GB)
The active path uses `Kijai/LTX2.3_comfy ltx-2.3-22b-dev_transformer_only_bf16.safetensors`.
Removed both — ~84 GB saved. Forced by HF eviction with `storage limit
exceeded (150G)` when total preload was ~234 GB. Risk: if a future workflow
update reintroduces the Lightricks-side filenames, lazy download takes over.
### ~~2. Drop `unsloth/LTX-2.3-GGUF` from preload (~39 GB)~~ — DONE 2026-05-02
Removed alongside (1). GGUF transformer is the low-VRAM alternative; ZeroGPU
H200 has 70 GB so the BF16 transformer always fits. Lazy-loads on first use
of any preset that wires the GGUF path.
### 3. Drop the `Lightricks/LTX-2-19b-LoRA-Camera-Control-Static/Jib-Up/Jib-Down` preload
Each is ~2 GB. The Power Lora Loader has them all listed but defaults all to
`on: false`, so they only load when the user picks one. Lazy-load is
appropriate. Currently kept in preload because of the 10-entry cap +
"easier to keep what we had".
### 4. Auto-generate `preload_from_hub` from `MODEL_REGISTRY`
Today the README list and `MODEL_REGISTRY` in `models.py` can drift. Build a
small `tools/sync_preload.py` that:
1. Reads `MODEL_REGISTRY`
2. Walks the workflow JSONs to find which entries are actually referenced
3. Sorts referenced entries by size (using `huggingface_hub` `repo_info`)
4. Picks the top N entries that fit in the 10-cap
5. Writes them back into the README YAML
Run as a pre-commit or CI step.
### 5. Bake custom-node clones into the build via `requirements.txt` git installs
We currently `git clone` 10 custom-node repos in `_bootstrap()` at runtime.
That's ~30 s of cold start. Some custom nodes ship as pip-installable; for
the others, we could write a small `tools/install_custom_nodes.py` that
runs at build time (via `pip install --no-deps` against git URLs) so the
repos land in the image instead of being fetched at boot.
Tradeoff: Spaces' build pipeline runs the gradio SDK Dockerfile which we
don't control directly. The custom-node clone has to happen at runtime
unless we can move it into the standard `requirements.txt` build step.
### 6. Persistent storage add-on as the "$25/mo button"
If iteration speed becomes the binding constraint, the persistent storage
add-on (Spaces > Settings) at $25/mo for 150 GB makes everything just work
`/data` is writable, models live there forever, no preload dance.
Sketched approach: `HF_HOME=/data/hf-cache` env var + `_bootstrap()` mkdir
fallback. One-line code change.
## Workflow / runtime
### 7. Move ComfyUI custom-node `requirements.txt` install to build time
Bootstrap currently `pip install`s each custom node's requirements at
runtime. Most are no-ops (deps already in our top-level `requirements.txt`)
but the `pip install --quiet` calls still take a few seconds each. Could
audit and just merge them into the top-level `requirements.txt`.
### 8. Clean up `nodes_replacements.py` warning
ComfyUI core at our pinned commit (`eb0686bb`) emits
`'function' object has no attribute 'register'` because the node-replacement
API surface is incomplete at that SHA. Bumping `COMFYUI_COMMIT` to a newer
tag should silence it. Pure cosmetic — no functional impact.
### 9. Auto-close drawer when user navigates away from header
Currently relies on document-level click listener. Works but has a
microsecond race when the click target is between elements. Could use
`pointerleave` on the drawer instead.
## Cost-of-running
### 10. Trim ZeroGPU duration cap
Currently `@spaces.GPU(duration=300)` reserves 5 min per call. For Fast preset
(distilled 8 steps) actual usage is ~30 s. Could shorten to 120 s — improves
queue priority for the user (per HF docs). Use dynamic duration based on
preset.