Fix CPU thread oversubscription: cap n_threads to 2 (cpu-basic vCPUs) for faster generation 4853f12 verified Bhuvandesai commited on 10 days ago
Re-deploy llama.cpp + GGUF CPU serving (default Q4_K_M); fast CPU inference e44cdab verified Bhuvandesai commited on 10 days ago
Revert "Migrate CPU serving to llama.cpp + Q5_K_M GGUF (was bf16 transformers)" d031aeb Bhuvandesai commited on 17 days ago
Revert "Make torch/transformers/peft lazy imports so CPU Space boots without them" 2f03b0e Bhuvandesai commited on 17 days ago
Make torch/transformers/peft lazy imports so CPU Space boots without them eac26c7 Bhuvandesai commited on 17 days ago
Migrate CPU serving to llama.cpp + Q5_K_M GGUF (was bf16 transformers) 564ad28 Bhuvandesai commited on 17 days ago
Update CPU warning banner with accurate load and query times e4ff54d Bhuvandesai commited on 17 days ago
Fix inference device bug, auto-load model, remove fine-tuning console, UI cleanup f5ce94d Bhuvandesai commited on 17 days ago