fix error format wrapping now applies to /v1/chat/completions and generation stats
470e737
Running
Dmitry Beresnevcommited on
add token generation speed to ui
e8080f5
Dmitry Beresnevcommited on
Log detailed error bodies for UI failures
7caa6ba
Dmitry Beresnevcommited on
Fix 400 for llama.cpp web UI completion requests
677456b
Dmitry Beresnevcommited on
Fix web UI chat by adding buffered SSE fallback
6379bd0
Dmitry Beresnevcommited on
fix build bugs
acdc6c1
Dmitry Beresnevcommited on
Refactor the C++ LLM manager into modular components, moves Python modules under python/, and keeps the current control-plane behavior intact. The C++ server now has clearer separation for config, model lifecycle, runtime services, request parsing, HTTP helpers, and server routing, while Docker build/runtime paths were updated to compile multiple C++ files and load Python code from the new package folder