Commit History

Run on ZeroGPU: spaces @GPU + CUDA llama.cpp (n_gpu_layers=-1)
5f2fdbe
Running
unverified

AlexWortega commited on

Fix CPU perf: pin n_threads=2 (cpu-basic), tighter token budget
f9ff64c
unverified

AlexWortega commited on

Rebuild as native Gradio app: server-side LFM2 (GGUF) rollout
b5308c5
unverified

AlexWortega commited on

Bake React routes into Gradio app before launch via _app=
f5d3e58
unverified

AlexWortega commited on

Disable Spaces hot-reload so React UI serves (fixes 7861 + hijack)
72ec70b
unverified

AlexWortega commited on

Serve React UI via Gradio's own launched server (cpu-basic)
7fc36e6
unverified

AlexWortega commited on

Restore real app: FastAPI(React+API) + mounted Gradio on cpu-basic
d5a75a3
unverified

AlexWortega commited on

diag: vanilla gradio app to discover HF launch mechanism
ea8b907
unverified

AlexWortega commited on

Fix gradio Space startup: override demo.launch() to serve the full app
ff8fe36
unverified

AlexWortega Claude Opus 4.7 (1M context) commited on

Fix gradio Space startup: serve at import time on GRADIO_SERVER_PORT
71fc79f
unverified

AlexWortega Claude Opus 4.7 (1M context) commited on

Redeploy as gradio SDK Space (no custom Docker)
c570807
unverified

AlexWortega Claude Opus 4.7 (1M context) commited on

Revert "exp: drop "Predict next frame:" suffix — append-only prompt format"
35d1ba8

Anonumous commited on

exp: drop "Predict next frame:" suffix — append-only prompt format
0329a68

Anonumous commited on

perf: multi-frame v2 — no_repeat_ngram_size kills repetition loops
39725c3

Anonumous commited on

docs: refresh dtype comment — FA now lit, not bypassed
5b1578a

Anonumous commited on

Revert "perf: emit N frames per generate() — amortize prefill across rollout"
04a2fa9

Anonumous commited on

fix: re-export ONNX without gqa_attention_bias subgraph (LFM2.5-VL parity)
aa82f6a

Anonumous commited on

perf: emit N frames per generate() — amortize prefill across rollout
1bd18c5

Anonumous commited on

perf: cascade graphOpt basic→disabled, recover safe fusions
eb078e5

Anonumous commited on

perf: switch WebGPU to q4f16 (fp16 compute, ~2x speedup) with q4 fallback
34b7454

Anonumous commited on

fix: disable ORT graph optimizer to preserve float32 bias types on WebGPU
b61c828

Anonumous commited on

fix: force device=wasm, ORT WebGPU breaks GQA float32 type constraint
65dea4f

Anonumous commited on

deploy: fix WebGPU model loading — use q4 dtype, add debug logging
7b03c17

Anonumous commited on