How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf aaditya/gemma-2b-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf aaditya/gemma-2b-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf aaditya/gemma-2b-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf aaditya/gemma-2b-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf aaditya/gemma-2b-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf aaditya/gemma-2b-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf aaditya/gemma-2b-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf aaditya/gemma-2b-GGUF:Q4_K_M
Use Docker
docker model run hf.co/aaditya/gemma-2b-GGUF:Q4_K_M
Quick Links

Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports Free Notebooks Performance Memory use
Gemma 7b ▶️ Start on Colab 2.4x faster 58% less
Mistral 7b ▶️ Start on Colab 2.2x faster 62% less
Llama-2 7b ▶️ Start on Colab 2.2x faster 43% less
TinyLlama ▶️ Start on Colab 3.9x faster 74% less
CodeLlama 34b A100 ▶️ Start on Colab 1.9x faster 27% less
Mistral 7b 1xT4 ▶️ Start on Kaggle 5x faster* 62% less
DPO - Zephyr ▶️ Start on Colab 1.9x faster 19% less
Downloads last month
57
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support