tiny-army / llm.py

Commit History

Persona: cap n_threads=2 (cpu_count over-reports host cores in a container → llama.cpp thrashing/50x slowdown); smaller ctx; /persona/status diag; lower token cap
2f7e532

polats commited on

Persona: load model under its own lock + prewarm in background (cold-start download no longer makes requests 'busy')
f85d7c3

polats commited on

Persona endpoint: stop generation on client disconnect, fail-fast lock, lower token cap (prevents abandoned-gen lock pile-up)
1df0cfb

polats commited on

Personas + war-diary via llama.cpp (reusing woid's persona SSE protocol)
67f4321

polats Claude Opus 4.8 (1M context) commited on