How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="macwhisperer/Gemma4-4B-SuperDense",
	filename="Gemma4-4B-Dense-Imatrix-Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

📟 Gemma4-4B-Dense-Imatrix-Q4_K_M.gguf (2026 Edition)

"Local intelligence... to the max."

This is a custom-quantized version of Gemma4-4B, specifically optimized to obtain the highest possible local byte-intelligence ratio with 8GB+ RAM consumer laptops or computers.

🧠 Why this model is different

Unlike a standard quant, this model was processed using a custom Importance Matrix (imatrix). The training data for the imatrix was hand-curated to preserve:

  • Incredible reasoning: Inclusion of custom coding examples built with frontier models provides high retention of very specific and sharp architectural reasoning skills
  • Logical Flow: Inclusion of llama.cpp source code, logic puzzles, and historical writing in the imatrix training to ensure the model stays coherent at low bitrates.
  • High Speed: Built using llama.cpp specifically for local-first AI and edge computing setups like apple silicon with minimum 24GB RAM

🛠 Quantization Details

  • Base Model: Gemma4-4B
  • Quantization: Q4_K_M
  • Format: GGUF
  • Size: ~5.34 GB
  • Context Length: 262144 tokens

📈 Perplexity Benchmarks

The following results were generated using llama-perplexity on the wikitext-2-raw/wiki.test.raw dataset.

Model Precision Perplexity (PPL) Δ PPL
Gemma4-4B (Baseline) BF16 62.3100 -
Gemma4-4B (Quant) Q4_K_M 60.5833 -1.7267

⚖️ Evaluation Verdict

coming soon

🚀 Hardware Performance (Apple M2)

coming soon

🌐 Links

Check out my other models!


24GB+ (RAM)

Qwen3.6-35B-SuperMoE.

Qwen3.6-27B-SuperDense.

Gemma4-31B-SuperDense.


8GB+ (RAM)

Qwen3.5-9B-SuperDense.

Qwen3.5-4B-SuperDense.

Gemma4-2B-SuperDense.


4GB+ (RAM)

Smartchild.


All make excellent companions to this model!


Downloads last month
-
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for macwhisperer/Gemma4-4B-SuperDense

Quantized
(194)
this model