Commit History

Strip MiniCPM <think>...</think> reasoning tags from generated text
5f5ab47
verified

unity4ar commited on

Pin transformers<5.0 so KV cache works with MiniCPM bundled code; re-enable use_cache
148a99f
verified

unity4ar commited on

Revert to use_cache=False (eager attn also broken); hide all audio UI
df99cf8
verified

unity4ar commited on

Speed up: eager attn + KV cache; drop chat retries to 1; remove MiniCPM-o voice UI artifact
3326f32
verified

unity4ar commited on

Cap zerogpu max_new_tokens at 256 (use_cache=False makes long generations O(n^2))
c788ee1
verified

unity4ar commited on

Disable KV cache: openbmb modeling_minicpm.py has a cache_utils API drift bug
e06599a
verified

unity4ar commited on

Move .to('cuda') inside @spaces.GPU; background thread keeps model on CPU to avoid emulation bypass
5a8fab5
verified

unity4ar commited on

Shim is_torch_fx_available so MiniCPM trust_remote_code import works on transformers >= 5.0
90e360d
verified

unity4ar commited on

Use canonical .to('cuda') pattern + progress logs so container log shows what loader is doing
031ce2d
verified

unity4ar commited on

Load model in background thread so health/status endpoints don't block on 16GB download
fbd952d
verified

unity4ar commited on

Load model on cuda at module level (canonical ZeroGPU pattern)
a475083
verified

unity4ar commited on

Refactor: Docker+llama.cpp -> Gradio SDK + ZeroGPU transformers backend
7036a02
verified

unity4ar commited on

Ship Phantom Grid Docker Space
d2e6f94
verified

unity4ar commited on