alpha-core-ai / model_manager.py

Commit History

Multi-stage Docker build: Stage 1 compiles llama-cpp-python once, Stage 2 reuses compiled wheels - NO TIMEOUT! Build time 8-12 minutes first time, then cached.
9d2777a

Sabithulla commited on

Multi-stage Docker build: Stage 1 compiles llama-cpp-python to wheel, Stage 2 installs pre-built wheel - NO TIMEOUT! Pre-download fast-chat model at build time.
3274ec4

Sabithulla commited on

Switch to Ollama for zero-compilation deployment - pre-downloads models at startup
64f495c

Sabithulla commited on

Revert to llama-cpp-python with storage optimization - only load fast-chat at startup
1454974

Sabithulla commited on

Fix: use mistral type for qwen models (not supported directly) + fallback to llama
47c4481

Sabithulla commited on

Fix: only load fast-chat at startup (350MB) - skip other large models to save storage
264847d

Sabithulla commited on

Fix: map model types correctly for ctransformers (qwen, phi, llama, mistral)
fb749c5

Sabithulla commited on

Switch to ctransformers (pre-built, no compilation!) - faster HF Spaces deploy
cf04577

Sabithulla commited on

Add FastAPI backend with Docker for HuggingFace Spaces
2a72045

Sabithulla commited on