Persona: cap n_threads=2 (cpu_count over-reports host cores in a container → llama.cpp thrashing/50x slowdown); smaller ctx; /persona/status diag; lower token cap 2f7e532 polats commited on 5 days ago
Persona: load model under its own lock + prewarm in background (cold-start download no longer makes requests 'busy') f85d7c3 polats commited on 5 days ago
Persona endpoint: stop generation on client disconnect, fail-fast lock, lower token cap (prevents abandoned-gen lock pile-up) 1df0cfb polats commited on 5 days ago
Personas + war-diary via llama.cpp (reusing woid's persona SSE protocol) 67f4321 polats Claude Opus 4.8 (1M context) commited on 5 days ago