Instructions to use jgebbeken/gemma-4-coder-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jgebbeken/gemma-4-coder-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jgebbeken/gemma-4-coder-gguf", filename="gemma-4-E4b-it.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use jgebbeken/gemma-4-coder-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16 # Run inference directly in the terminal: ./llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf jgebbeken/gemma-4-coder-gguf:BF16
Use Docker
docker model run hf.co/jgebbeken/gemma-4-coder-gguf:BF16
- LM Studio
- Jan
- Ollama
How to use jgebbeken/gemma-4-coder-gguf with Ollama:
ollama run hf.co/jgebbeken/gemma-4-coder-gguf:BF16
- Unsloth Studio
How to use jgebbeken/gemma-4-coder-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jgebbeken/gemma-4-coder-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jgebbeken/gemma-4-coder-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jgebbeken/gemma-4-coder-gguf to start chatting
- Pi
How to use jgebbeken/gemma-4-coder-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jgebbeken/gemma-4-coder-gguf:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jgebbeken/gemma-4-coder-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf jgebbeken/gemma-4-coder-gguf:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jgebbeken/gemma-4-coder-gguf:BF16
Run Hermes
hermes
- Docker Model Runner
How to use jgebbeken/gemma-4-coder-gguf with Docker Model Runner:
docker model run hf.co/jgebbeken/gemma-4-coder-gguf:BF16
- Lemonade
How to use jgebbeken/gemma-4-coder-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jgebbeken/gemma-4-coder-gguf:BF16
Run and chat with the model
lemonade run user.gemma-4-coder-gguf-BF16
List all available models
lemonade list
Thanks for making this!
What in your readme is supposed to come after
π Training Data
?
Curious about how you did it.
Hello and thank you for enjoying my work on this model. What you are seeing, nothing comes after that. It was this -> Primary Dataset: Magicoder-Evol-Instruct-110K π Training Data. Sorry for the confusion. The model didn't need much training data. Just needed slight correction. I tried Nvidia Open Code datasets but that affected the model greatly on several training sessions.
Hi @jgebbeken ! Great work on this model.
I ran your gemma-4-coder through my LLM Reasoning Benchmark v10 β a custom test suite designed to evaluate logical reasoning capabilities of local models. Here are the results.
Benchmark Overview
30 tests across 10 categories (3 difficulty variants each)
Categories: Arithmetic, Logic (constraint satisfaction), Speed/Time, Combinatorics, Age Algebra, Truth/Liars puzzles, Optimization, Probability, Graph pathfinding, Business problems
All answers are validated programmatically against known correct solutions
Models must output structured JSON with both reasoning and final answers
Scoring v2.0 with partial credit and cascade error detection
Results: gemma-4-coder vs 41 other models
Model Score Perfect tests Avg tokens Total time
gemma-4-coder 200/200 30/30 ~835 ~8 min
microsoft/phi-4-reasoning-plus [THINKING] 200/200 30/30 ~1,516 ~43 min
qwen/qwen3-coder-30b ~174/200 22/30 β β
gigachat3.1-10b-a1.8b ~165/200 17/30 β β
qwen/qwen2.5-coder-14b ~132/200 14/30 β β
gemma-4-coder achieved a perfect score β the only non-thinking model to do so.
Compared to the other perfect scorer (phi-4-reasoning-plus):
1.8x fewer tokens per response
5.2x faster total benchmark time
No [THINKING] mode required β solves everything via direct generation
Environment
Server: LM Studio (localhost)
Hardware: local GPU inference
Settings: max_tokens=8192, default sampling parameters
Quantization: Q4_K_M (as provided in this repo)
Key observations
Your model scored perfectly across all categories including the hardest ones (Combinatorics, Graph pathfinding, Truth/Liars) where most other models fail. This is especially impressive given that the Magicoder fine-tune targets code tasks, yet the benchmark tests pure logical/mathematical reasoning.
This suggests the base Gemma 4 E4B architecture is exceptionally strong, and your fine-tune preserved (or slightly enhanced) its reasoning capabilities while adding code specialization.
Full benchmark is still running across all 42 models. I plan to share complete results on r/LocalLLaMA soon.
Thank you for releasing this model β it's a hidden gem that deserves more attention!
Wow I am actually amazed by this. Thank you. I wasn't completely sure how my model would fare with benchmarks. If it is alright with you. I would like to post these results. Maybe give it the light it deserves.
Here at the link you can find the source code for the tests themselves (I wrote them for myself to quickly test a huge collection of local models), the raw responses from all the models, and the summary test results.
https://huggingface.co/Fortser/Flux_Krea/resolve/main/tests.zip
In terms of the speed-to-quality ratio, your model is the clear leader.
77t/s on 5060ti 16GB llama.cpp Q4 -c 131K crazy