Spaces:

innovatorved
/

llama-server

Sleeping

App Files Files Community

Commit History

Apply minimal safe perf wins for free-tier HF Space

cc953bd

Ved Gupta commited on Apr 26

Revert all perf tuning, restore working pre-tune config

509e3de

Ved Gupta commited on Apr 26

Revert KV-cache quant + force flash-attn defaults

62928db

Ved Gupta commited on Apr 26

Fix --flash-attn flag: pass explicit on/off value

eb29104

Ved Gupta commited on Apr 26

Merge branch 'main' of https://huggingface.co/spaces/innovatorved/llama-server

fa38229

Ved Gupta commited on Apr 26

Tune llama-server for faster CPU inference on HF free tier

f732b48

Ved Gupta commited on Apr 26

Initial commit: llama.cpp OpenAI-compatible server for Gemma 4 E2B

82bd3da

vedgupta commited on Apr 26

Keep upstream /app layout intact so dlopen finds GGML CPU backend plugin

b00df39

innovatorved commited on Apr 26

Fix UID 1000 collision on ubuntu:24.04 (delete default 'ubuntu' user)

3c6e293

innovatorved commited on Apr 26

Use ubuntu:24.04 runtime to match upstream :server glibc/libstdc++ ABI

b16d1e7

innovatorved commited on Apr 26

Use upstream ghcr.io/ggml-org/llama.cpp:server image (no source build)

8d5716a

innovatorved commited on Apr 26

Fix build: keep LLAMA_BUILD_TOOLS=ON (server target lives under it)

563eb80

innovatorved commited on Apr 26

Slim down Docker build (server-only, no BLAS/curl, strip + symlinks)

c8b257f

innovatorved commited on Apr 26

Switch default model to gemma-4-E2B-it (UD-Q4_K_XL, ~3.2 GB)

5c3437f

Ved Gupta commited on Apr 26

Add llama.cpp OpenAI-compatible server for gemma-4-26B-A4B-it

f00e734

Ved Gupta commited on Apr 26

initial commit

4bd6985
verified

innovatorved commited on Apr 26