Multi-stage Docker build: Stage 1 compiles llama-cpp-python once, Stage 2 reuses compiled wheels - NO TIMEOUT! Build time 8-12 minutes first time, then cached. 9d2777a Sabithulla commited on 29 days ago
Multi-stage Docker build: Stage 1 compiles llama-cpp-python to wheel, Stage 2 installs pre-built wheel - NO TIMEOUT! Pre-download fast-chat model at build time. 3274ec4 Sabithulla commited on 30 days ago
Switch to Ollama for zero-compilation deployment - pre-downloads models at startup 64f495c Sabithulla commited on 30 days ago
Revert to llama-cpp-python with storage optimization - only load fast-chat at startup 1454974 Sabithulla commited on 30 days ago
Fix: use mistral type for qwen models (not supported directly) + fallback to llama 47c4481 Sabithulla commited on 30 days ago
Fix: only load fast-chat at startup (350MB) - skip other large models to save storage 264847d Sabithulla commited on 30 days ago
Fix: map model types correctly for ctransformers (qwen, phi, llama, mistral) fb749c5 Sabithulla commited on 30 days ago
Switch to ctransformers (pre-built, no compilation!) - faster HF Spaces deploy cf04577 Sabithulla commited on 30 days ago