perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following 18dc770 imtrt004 commited on Feb 27
fix: JSON-encode SSE tokens to preserve newlines in markdown; reduce top_k to 3 ae897ea imtrt004 commited on Feb 27
fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError 8210d54 imtrt004 commited on Feb 27
feat: self-hosted Qwen2.5-1.5B-Instruct via transformers β no external API, no compilation deea70e imtrt004 commited on Feb 27
feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation) 98e3f05 imtrt004 commited on Feb 27
fix: restore build-essential+cmake, pin llama-cpp-python==0.3.16 for stable layer cache bfaa120 imtrt004 commited on Feb 27
fix: use pre-built llama-cpp-python CPU wheel β eliminates 8min C++ compile 6e6147b imtrt004 commited on Feb 27
fix: Dockerfile β pre-install CPU torch, upgrade llama-cpp-python to >=0.3.14 (qwen3 support) 5cfcd30 imtrt004 commited on Feb 27
fix: upgrade llama-cpp-python >=0.3.14 for qwen3 arch support (was 0.3.8, pre-May 2025) a0250ac imtrt004 commited on Feb 27
fix: double .gguf extension β skip symlink when path already ends in .gguf; add verbose step logging dbce995 imtrt004 commited on Feb 27
feat: LLM readiness tracking β 503 while loading, llm_ready in /health 256f0fc imtrt004 commited on Feb 27