Strip MiniCPM <think>...</think> reasoning tags from generated text 5f5ab47 verified unity4ar commited on 17 days ago
Relax huggingface_hub pin to <1.0 to satisfy transformers 4.55-4.57 fdb54cf verified unity4ar commited on 17 days ago
Pin transformers<5.0 so KV cache works with MiniCPM bundled code; re-enable use_cache 148a99f verified unity4ar commited on 17 days ago
Revert to use_cache=False (eager attn also broken); hide all audio UI df99cf8 verified unity4ar commited on 17 days ago
Speed up: eager attn + KV cache; drop chat retries to 1; remove MiniCPM-o voice UI artifact 3326f32 verified unity4ar commited on 17 days ago
Cap zerogpu max_new_tokens at 256 (use_cache=False makes long generations O(n^2)) c788ee1 verified unity4ar commited on 17 days ago
Disable KV cache: openbmb modeling_minicpm.py has a cache_utils API drift bug e06599a verified unity4ar commited on 17 days ago
Log + surface RuntimeErrors from witness chat too (still 503 for those) a5ce744 verified unity4ar commited on 17 days ago
Surface witness chat failures with traceback + error class in 500 detail 5b4e454 verified unity4ar commited on 17 days ago
Add the 4 map layer PNGs missing from the initial Docker-era ship (ship_space.sh excluded data/*.png) 46e31a7 verified unity4ar commited on 17 days ago
Move .to('cuda') inside @spaces.GPU; background thread keeps model on CPU to avoid emulation bypass 5a8fab5 verified unity4ar commited on 17 days ago
Shim is_torch_fx_available so MiniCPM trust_remote_code import works on transformers >= 5.0 90e360d verified unity4ar commited on 17 days ago
Use canonical .to('cuda') pattern + progress logs so container log shows what loader is doing 031ce2d verified unity4ar commited on 17 days ago
Surface zerogpu backend load_error / load detail in setup status 47c194f verified unity4ar commited on 17 days ago
Eagerly import zerogpu_backend on Spaces so @spaces.GPU is registered before startup scan 50761af verified unity4ar commited on 17 days ago
Force demo.launch to bind 0.0.0.0:$PORT on HF Spaces (CLI hot-reload ignores GRADIO_SERVER_NAME) 4f670ea verified unity4ar commited on 17 days ago
Load model in background thread so health/status endpoints don't block on 16GB download fbd952d verified unity4ar commited on 17 days ago
Expose `demo` at module scope so Gradio SDK runner can launch the gr.Server app 5cb944e verified unity4ar commited on 17 days ago
Default provider to zerogpu_transformers on HF Spaces; drop bogus README env block e9ef2b5 verified unity4ar commited on 17 days ago
Gate setup/llama subprocess paths behind provider check; allow zerogpu_transformers 50a467b verified unity4ar commited on 17 days ago
Load model on cuda at module level (canonical ZeroGPU pattern) a475083 verified unity4ar commited on 17 days ago
Refactor: Docker+llama.cpp -> Gradio SDK + ZeroGPU transformers backend 7036a02 verified unity4ar commited on 17 days ago
Fix port collision: scope PORT env to llama.cpp subprocess 16501bb verified unity4ar commited on 17 days ago