Re-deploy llama.cpp + GGUF CPU serving (default Q4_K_M); fast CPU inference e44cdab verified Bhuvandesai commited on 10 days ago
Revert "Migrate CPU serving to llama.cpp + Q5_K_M GGUF (was bf16 transformers)" d031aeb Bhuvandesai commited on 17 days ago
Migrate CPU serving to llama.cpp + Q5_K_M GGUF (was bf16 transformers) 564ad28 Bhuvandesai commited on 17 days ago