Use python:3.10-slim + compile llama-cpp 0.2.90 (proven working version)" bf941fe verified Rofati commited on May 22
Use python:3.11-slim + compile llama-cpp with OpenBLAS for CPU speed" a4a6838 verified Rofati commited on May 22
Revert to working config: Q8_0 (was working before), pre-download for fast startup" d01a9fa verified Rofati commited on May 22
Switch to ghcr.io/abetlen/llama-cpp-python (has CPU optimizations) + Q4_K_M + pre-download 27585df verified Rofati commited on May 22
Revert to Llama-3.2-1B Q4_K_M (proven working, no thinking issues) with speed optimizations" 5d02182 verified Rofati commited on May 22
Speed overhaul: Qwen3-0.6B Q4_K_M (397MB, 3x faster), pre-built wheel, optimized config b76d2ed verified Rofati commited on May 22