perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following 18dc770 imtrt004 commited on 19 days ago
perf: greedy decoding + dtype fix - 2-3x faster inference on CPU d16b829 imtrt004 commited on 19 days ago
fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError 8210d54 imtrt004 commited on 19 days ago
feat: self-hosted Qwen2.5-1.5B-Instruct via transformers β no external API, no compilation deea70e imtrt004 commited on 19 days ago
feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation) 98e3f05 imtrt004 commited on 19 days ago
fix: double .gguf extension β skip symlink when path already ends in .gguf; add verbose step logging dbce995 imtrt004 commited on 20 days ago
feat: LLM readiness tracking β 503 while loading, llm_ready in /health 256f0fc imtrt004 commited on 20 days ago
fix: symlink blob to .gguf extension so llama.cpp C loader accepts it 915613c imtrt004 commited on 20 days ago
fix: use hf_hub_download + realpath to avoid snapshot ./path crash fd0d531 imtrt004 commited on 20 days ago