Instructions to use jgebbeken/gemma-4-coder-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jgebbeken/gemma-4-coder-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jgebbeken/gemma-4-coder-gguf",
	filename="gemma-4-E4b-it.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use jgebbeken/gemma-4-coder-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16
# Run inference directly in the terminal:
llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16
# Run inference directly in the terminal:
llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16
# Run inference directly in the terminal:
./llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16

Use Docker

docker model run hf.co/jgebbeken/gemma-4-coder-gguf:BF16

LM Studio
Jan
Ollama
How to use jgebbeken/gemma-4-coder-gguf with Ollama:
```
ollama run hf.co/jgebbeken/gemma-4-coder-gguf:BF16
```

Unsloth Studio

How to use jgebbeken/gemma-4-coder-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jgebbeken/gemma-4-coder-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jgebbeken/gemma-4-coder-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jgebbeken/gemma-4-coder-gguf to start chatting

How to use jgebbeken/gemma-4-coder-gguf with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jgebbeken/gemma-4-coder-gguf:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jgebbeken/gemma-4-coder-gguf with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jgebbeken/gemma-4-coder-gguf:BF16

Run Hermes

hermes

Docker Model Runner
How to use jgebbeken/gemma-4-coder-gguf with Docker Model Runner:
```
docker model run hf.co/jgebbeken/gemma-4-coder-gguf:BF16
```

Lemonade

How to use jgebbeken/gemma-4-coder-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jgebbeken/gemma-4-coder-gguf:BF16

Run and chat with the model

lemonade run user.gemma-4-coder-gguf-BF16

List all available models

lemonade list

Thanks for making this!

by BingoBird - opened Apr 13

Discussion

BingoBird

Apr 13

What in your readme is supposed to come after
📊 Training Data
?

Curious about how you did it.

jgebbeken

Owner Apr 14

Hello and thank you for enjoying my work on this model. What you are seeing, nothing comes after that. It was this -> Primary Dataset: Magicoder-Evol-Instruct-110K 📊 Training Data. Sorry for the confusion. The model didn't need much training data. Just needed slight correction. I tried Nvidia Open Code datasets but that affected the model greatly on several training sessions.

Fortser

Apr 16

Hi @jgebbeken ! Great work on this model.

I ran your gemma-4-coder through my LLM Reasoning Benchmark v10 — a custom test suite designed to evaluate logical reasoning capabilities of local models. Here are the results.
Benchmark Overview

30 tests across 10 categories (3 difficulty variants each)
Categories: Arithmetic, Logic (constraint satisfaction), Speed/Time, Combinatorics, Age Algebra, Truth/Liars puzzles, Optimization, Probability, Graph pathfinding, Business problems
All answers are validated programmatically against known correct solutions
Models must output structured JSON with both reasoning and final answers
Scoring v2.0 with partial credit and cascade error detection

Results: gemma-4-coder vs 41 other models
Model Score Perfect tests Avg tokens Total time
gemma-4-coder 200/200 30/30 ~835 ~8 min
microsoft/phi-4-reasoning-plus [THINKING] 200/200 30/30 ~1,516 ~43 min
qwen/qwen3-coder-30b ~174/200 22/30 — —
gigachat3.1-10b-a1.8b ~165/200 17/30 — —
qwen/qwen2.5-coder-14b ~132/200 14/30 — —

gemma-4-coder achieved a perfect score — the only non-thinking model to do so.

Compared to the other perfect scorer (phi-4-reasoning-plus):

1.8x fewer tokens per response
5.2x faster total benchmark time
No [THINKING] mode required — solves everything via direct generation

Environment

Server: LM Studio (localhost)
Hardware: local GPU inference
Settings: max_tokens=8192, default sampling parameters
Quantization: Q4_K_M (as provided in this repo)

Key observations

Your model scored perfectly across all categories including the hardest ones (Combinatorics, Graph pathfinding, Truth/Liars) where most other models fail. This is especially impressive given that the Magicoder fine-tune targets code tasks, yet the benchmark tests pure logical/mathematical reasoning.

This suggests the base Gemma 4 E4B architecture is exceptionally strong, and your fine-tune preserved (or slightly enhanced) its reasoning capabilities while adding code specialization.

Full benchmark is still running across all 42 models. I plan to share complete results on r/LocalLLaMA soon.

Thank you for releasing this model — it's a hidden gem that deserves more attention!

jgebbeken

Owner Apr 17

Wow I am actually amazed by this. Thank you. I wasn't completely sure how my model would fare with benchmarks. If it is alright with you. I would like to post these results. Maybe give it the light it deserves.

Fortser

Apr 18

Here at the link you can find the source code for the tests themselves (I wrote them for myself to quickly test a huge collection of local models), the raw responses from all the models, and the summary test results.
https://huggingface.co/Fortser/Flux_Krea/resolve/main/tests.zip
In terms of the speed-to-quality ratio, your model is the clear leader.

Rad1Grad

Apr 26

77t/s on 5060ti 16GB llama.cpp Q4 -c 131K crazy

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment