Commit History

Apply minimal safe perf wins for free-tier HF Space
cc953bd

Ved Gupta commited on

Revert all perf tuning, restore working pre-tune config
509e3de

Ved Gupta commited on

Revert KV-cache quant + force flash-attn defaults
62928db

Ved Gupta commited on

Fix --flash-attn flag: pass explicit on/off value
eb29104

Ved Gupta commited on

Merge branch 'main' of https://huggingface.co/spaces/innovatorved/llama-server
fa38229

Ved Gupta commited on

Tune llama-server for faster CPU inference on HF free tier
f732b48

Ved Gupta commited on

Initial commit: llama.cpp OpenAI-compatible server for Gemma 4 E2B
82bd3da

vedgupta commited on

Keep upstream /app layout intact so dlopen finds GGML CPU backend plugin
b00df39

innovatorved commited on

Fix UID 1000 collision on ubuntu:24.04 (delete default 'ubuntu' user)
3c6e293

innovatorved commited on

Use ubuntu:24.04 runtime to match upstream :server glibc/libstdc++ ABI
b16d1e7

innovatorved commited on

Use upstream ghcr.io/ggml-org/llama.cpp:server image (no source build)
8d5716a

innovatorved commited on

Fix build: keep LLAMA_BUILD_TOOLS=ON (server target lives under it)
563eb80

innovatorved commited on

Slim down Docker build (server-only, no BLAS/curl, strip + symlinks)
c8b257f

innovatorved commited on

Switch default model to gemma-4-E2B-it (UD-Q4_K_XL, ~3.2 GB)
5c3437f

Ved Gupta commited on

Add llama.cpp OpenAI-compatible server for gemma-4-26B-A4B-it
f00e734

Ved Gupta commited on

initial commit
4bd6985
verified

innovatorved commited on