Commit History

Fix CPU thread oversubscription: cap n_threads to 2 (cpu-basic vCPUs) for faster generation
4853f12
verified

Bhuvandesai commited on

Re-deploy llama.cpp + GGUF CPU serving (default Q4_K_M); fast CPU inference
e44cdab
verified

Bhuvandesai commited on

Revert "Migrate CPU serving to llama.cpp + Q5_K_M GGUF (was bf16 transformers)"
d031aeb

Bhuvandesai commited on

Revert "Make torch/transformers/peft lazy imports so CPU Space boots without them"
2f03b0e

Bhuvandesai commited on

Make torch/transformers/peft lazy imports so CPU Space boots without them
eac26c7

Bhuvandesai commited on

Migrate CPU serving to llama.cpp + Q5_K_M GGUF (was bf16 transformers)
564ad28

Bhuvandesai commited on

Update CPU warning banner with accurate load and query times
e4ff54d

Bhuvandesai commited on

Fix inference device bug, auto-load model, remove fine-tuning console, UI cleanup
f5ce94d

Bhuvandesai commited on

initial deployment
55159b1

Bhuvandesai commited on