Run on ZeroGPU: spaces @GPU + CUDA llama.cpp (n_gpu_layers=-1) 5f2fdbe Running unverified AlexWortega commited on about 10 hours ago
Fix CPU perf: pin n_threads=2 (cpu-basic), tighter token budget f9ff64c unverified AlexWortega commited on about 10 hours ago
Rebuild as native Gradio app: server-side LFM2 (GGUF) rollout b5308c5 unverified AlexWortega commited on about 12 hours ago
Bake React routes into Gradio app before launch via _app= f5d3e58 unverified AlexWortega commited on about 13 hours ago
Disable Spaces hot-reload so React UI serves (fixes 7861 + hijack) 72ec70b unverified AlexWortega commited on about 13 hours ago
Serve React UI via Gradio's own launched server (cpu-basic) 7fc36e6 unverified AlexWortega commited on about 13 hours ago
Restore real app: FastAPI(React+API) + mounted Gradio on cpu-basic d5a75a3 unverified AlexWortega commited on about 13 hours ago
diag: vanilla gradio app to discover HF launch mechanism ea8b907 unverified AlexWortega commited on about 14 hours ago
Fix gradio Space startup: override demo.launch() to serve the full app ff8fe36 unverified AlexWortega Claude Opus 4.7 (1M context) commited on about 14 hours ago
Fix gradio Space startup: serve at import time on GRADIO_SERVER_PORT 71fc79f unverified AlexWortega Claude Opus 4.7 (1M context) commited on about 14 hours ago
Redeploy as gradio SDK Space (no custom Docker) c570807 unverified AlexWortega Claude Opus 4.7 (1M context) commited on about 14 hours ago
Revert "exp: drop "Predict next frame:" suffix — append-only prompt format" 35d1ba8 Anonumous commited on 13 days ago
exp: drop "Predict next frame:" suffix — append-only prompt format 0329a68 Anonumous commited on 13 days ago
perf: multi-frame v2 — no_repeat_ngram_size kills repetition loops 39725c3 Anonumous commited on 13 days ago
Revert "perf: emit N frames per generate() — amortize prefill across rollout" 04a2fa9 Anonumous commited on 14 days ago
fix: re-export ONNX without gqa_attention_bias subgraph (LFM2.5-VL parity) aa82f6a Anonumous commited on 14 days ago
perf: emit N frames per generate() — amortize prefill across rollout 1bd18c5 Anonumous commited on 14 days ago
perf: cascade graphOpt basic→disabled, recover safe fusions eb078e5 Anonumous commited on 14 days ago
perf: switch WebGPU to q4f16 (fp16 compute, ~2x speedup) with q4 fallback 34b7454 Anonumous commited on 14 days ago
fix: disable ORT graph optimizer to preserve float32 bias types on WebGPU b61c828 Anonumous commited on 15 days ago
fix: force device=wasm, ORT WebGPU breaks GQA float32 type constraint 65dea4f Anonumous commited on 15 days ago
deploy: fix WebGPU model loading — use q4 dtype, add debug logging 7b03c17 Anonumous commited on 15 days ago