Instructions to use igorls/gemma4-e4b-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use igorls/gemma4-e4b-classifier with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="igorls/gemma4-e4b-classifier")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("igorls/gemma4-e4b-classifier")
model = AutoModelForCausalLM.from_pretrained("igorls/gemma4-e4b-classifier")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use igorls/gemma4-e4b-classifier with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="igorls/gemma4-e4b-classifier",
	filename="gemma4-e4b-classifier-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use igorls/gemma4-e4b-classifier with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M

Use Docker

docker model run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M

LM Studio
Jan

vLLM

How to use igorls/gemma4-e4b-classifier with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "igorls/gemma4-e4b-classifier"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "igorls/gemma4-e4b-classifier",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M

SGLang

How to use igorls/gemma4-e4b-classifier with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "igorls/gemma4-e4b-classifier" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "igorls/gemma4-e4b-classifier",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "igorls/gemma4-e4b-classifier" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "igorls/gemma4-e4b-classifier",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use igorls/gemma4-e4b-classifier with Ollama:
```
ollama run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M
```

Unsloth Studio new

How to use igorls/gemma4-e4b-classifier with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for igorls/gemma4-e4b-classifier to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for igorls/gemma4-e4b-classifier to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for igorls/gemma4-e4b-classifier to start chatting

Pi new

How to use igorls/gemma4-e4b-classifier with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "igorls/gemma4-e4b-classifier:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use igorls/gemma4-e4b-classifier with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default igorls/gemma4-e4b-classifier:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use igorls/gemma4-e4b-classifier with Docker Model Runner:
```
docker model run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M
```

Lemonade

How to use igorls/gemma4-e4b-classifier with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull igorls/gemma4-e4b-classifier:Q4_K_M

Run and chat with the model

lemonade run user.gemma4-e4b-classifier-Q4_K_M

List all available models

lemonade list

igorls commited on 5 days ago

Commit

cf0525d

verified ·

1 Parent(s): d339099

Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

.gitattributes +3 -0
README.md +148 -0
chat_template.jinja +351 -0
config.json +96 -0
gemma4-e4b-classifier-Q4_K_M.gguf +3 -0
gemma4-e4b-classifier-Q8_0.gguf +3 -0
generation_config.json +10 -0
model.safetensors +3 -0
tokenizer.json +3 -0
tokenizer_config.json +96 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+gemma4-e4b-classifier-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+gemma4-e4b-classifier-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,148 @@

+---
+license: gemma
+base_model: google/gemma-4-E4B-it
+tags:
+- gemma
+- gemma-4
+- classification
+- text-only
+- vram-optimized
+- ollama
+language:
+- en
+- multilingual
+library_name: transformers
+pipeline_tag: text-generation
+---
+# Gemma 4 E4B Classifier (vision/audio-stripped)
+A modality-stripped variant of [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) for **text-only classification, entity extraction, and structured-memory extraction**. The vision encoder (~150M params) and audio encoder (~300M params) are removed; the text path is unchanged.
+**Headline:** Same instruction-tuned text behavior as the official Gemma 4 E4B-it, but at **6.5 GB resident VRAM instead of 10.6 GB** (Ollama Q4_K_M, RTX 3090, Linux). All safety alignment is preserved — this is **not** an abliterated or uncensored variant.
+## Why this exists
+Gemma 4 E4B is the local leader on small-model classification tasks (room classification, entity/memory extraction). It locks out users with 12 GB GPUs because the official Q4_K_M is 10.6 GB resident — the vision + audio encoders sit in VRAM whether you use them or not. For text-only workloads, those modality encoders are dead weight.
+This variant strips them via clean re-instantiation: load the multimodal checkpoint, copy text-path tensors into a fresh `Gemma4ForCausalLM(text_config)`, save. No safety-alignment changes. No retraining. No surgery on safetensors files.
+## How it compares
+Measured on RTX 3090, Ollama 0.x, against the MemPalace small-model benchmark harness (n=100 per task):
+| Task | Official `gemma4:e4b-it-q4_K_M` | This model (Q4_K_M) | Δ |
+|---|---:|---:|---:|
+| Calibration | 1.0000 | **1.0000** | 0.0000 |
+| Room classification (closed-set) | 0.6200 | **0.6200** | 0.0000 (exact tie) |
+| Room classification (open-set) | 0.6556 | 0.6526 | -0.0030 |
+| Entity extraction (F1) | 0.7519 | 0.7318 | -0.0201 |
+| Memory coverage | 0.9125 | **0.9375** | +0.0250 (higher) |
+| **VRAM resident** | **10626 MB** | **6517 MB** | **-4109 MB** |
+| e2e p50 (closed-set room) | 230.9 ms | 232.4 ms | +1.5 ms (noise) |
+All accuracy deltas are within statistical noise at n=100. The 4.1 GB VRAM win is real and reproducible.
+## What was actually dropped
+From the 7996.2M-parameter multimodal checkpoint:
+| Module | Params dropped |
+|---|---:|
+| `model.audio_tower.*` (USM-style conformer) | 304.8M |
+| `model.vision_tower.*` (MobileNet-v5 lineage) | 167.4M |
+| `model.embed_audio.*` (audio→text soft-token projector) | 3.9M |
+| `model.embed_vision.*` (vision→text soft-token projector) | 2.0M |
+| **Total dropped** | **478.1M (6.0%)** |
+| **Total kept** (text path) | **7518.1M (94.0%)** |
+The VRAM saving (4.1 GB) is significantly larger than the dropped weights account for (~250 MB at Q4_K_M). The remainder comes from: modality encoders kept at higher precision than Q4 inside the GGUF, activation buffers sized for image-token sequences (up to 1120 tokens/image), and the multimodal embedders' vocab-offset tables.
+## Quantization variants
+- **`Q4_K_M`** (5.3 GB on disk, 6517 MB resident) — recommended default.
+- **`Q8_0`** (8.0 GB on disk) — precision comparator; minimal accuracy lift on classification.
+- Source safetensors (this repo at bf16, 13.92 GB).
+## Usage
+### Hugging Face Transformers
+```python
+from transformers import AutoTokenizer, Gemma4ForCausalLM
+import torch
+tok = AutoTokenizer.from_pretrained("igorls/gemma4-e4b-classifier")
+model = Gemma4ForCausalLM.from_pretrained(
+    "igorls/gemma4-e4b-classifier",
+    torch_dtype=torch.bfloat16,
+    device_map="cuda",
+)
+messages = [{"role": "user", "content": "What is the capital of France? One word."}]
+chat = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+ids = tok(chat, return_tensors="pt").input_ids.to("cuda")
+out = model.generate(ids, max_new_tokens=10, do_sample=False)
+print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
+```
+### Ollama
+```bash
+ollama pull igorls/gemma4-e4b-classifier:Q4_K_M
+ollama run igorls/gemma4-e4b-classifier:Q4_K_M "What is the capital of France?"
+```
+For classification workloads, pass `"think": false` at the top level of the `/api/generate` request to disable Gemma 4's CoT mode (which otherwise consumes the `num_predict` budget):
+```bash
+curl http://localhost:11434/api/generate -d '{
+  "model": "igorls/gemma4-e4b-classifier:Q4_K_M",
+  "prompt": "Classify into one word (indoor, outdoor): The kids are playing in the backyard.",
+  "think": false,
+  "stream": false,
+  "options": {"temperature": 0, "num_predict": 16}
+}'
+```
+## Safety surface
+This variant is **safety-aligned identically to the official `gemma-4-E4B-it`**. The strip does not touch the text-path weights where alignment lives; it only removes the unused modality encoders.
+Validated on 18 raw NSFW classification samples (closed-set room, open-set slug invention, entity extraction with named entities, structured memory extraction with decisions/preferences/facts/commitments):
+- **Zero refusals** on any sample.
+- **JSON validity 100%** on the structured extraction tasks.
+- **Open-set slugs are functional** rather than euphemistic.
+This confirms the architectural insight from prior research: safety alignment doesn't surface on classification surfaces regardless. There's no reason to ship an uncensored variant for these workloads.
+## Limitations
+- **Text-only.** No vision input. No audio input. The encoders are gone. Passing image or audio tokens will produce undefined behavior.
+- **Same context window as base** (128k tokens).
+- **Same tokenizer.** The vocab includes vision/audio special tokens (`<image>`, `<audio>`, etc.) for compatibility with the official tokenizer; these tokens won't activate any modality processing in this variant.
+- **No MTP drafter support on Ollama yet.** The official `google/gemma-4-E4B-it-assistant` MTP drafter works with Transformers and vLLM but not with Ollama on Linux/CUDA as of May 2026 (upstream llama.cpp doesn't recognize the `Gemma4AssistantForCausalLM` architecture). For MTP-accelerated inference, use Transformers or vLLM directly with this model as the target.
+## License
+Inherited from the base model: [Gemma Terms of Use](https://ai.google.dev/gemma/terms). By using this model you agree to those terms.
+## Citation
+This is a derivative work of Google's Gemma 4 E4B. If you use it, please also credit:
+```
+@misc{gemma_2025,
+  title={Gemma 4 Technical Report},
+  author={Google DeepMind},
+  year={2026},
+  url={https://huggingface.co/google/gemma-4-E4B-it},
+}
+```
+## Acknowledgments
+- **Google DeepMind** for Gemma 4 and the open-weight release.
+- The **MemPalace small-model benchmark research** (PR #1447) that surfaced the VRAM gap and motivated this work.
+- The **`igorls/gemma-4-E4B-it-heretic-GGUF`** (author's prior abliteration experiment) for accidentally demonstrating the architectural VRAM win that this artifact reproduces through a clean, safety-aligned path.

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,351 @@

+{%- macro format_parameters(properties, required, filter_keys=false) -%}
+    {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
+    {%- set ns = namespace(found_first=false) -%}
+    {%- for key, value in properties | dictsort -%}
+        {%- set add_comma = false -%}
+        {%- if not filter_keys or key not in standard_keys -%}
+            {%- if ns.found_first %},{% endif -%}
+            {%- set ns.found_first = true -%}
+            {{ key }}:{
+            {%- if value['description'] -%}
+                description:<|"|>{{ value['description'] }}<|"|>
+                {%- set add_comma = true -%}
+            {%- endif -%}
+            {%- if value['type'] | upper == 'STRING' -%}
+                {%- if value['enum'] -%}
+                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
+                    enum:{{ format_argument(value['enum']) }}
+                {%- endif -%}
+            {%- elif value['type'] | upper == 'ARRAY' -%}
+                {%- if value['items'] is mapping and value['items'] -%}
+                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
+                    items:{
+                    {%- set ns_items = namespace(found_first=false) -%}
+                    {%- for item_key, item_value in value['items'] | dictsort -%}
+                        {%- if item_value is not none -%}
+                            {%- if ns_items.found_first %},{% endif -%}
+                            {%- set ns_items.found_first = true -%}
+                            {%- if item_key == 'properties' -%}
+                                properties:{
+                                {%- if item_value is mapping -%}
+                                    {{- format_parameters(item_value, value['items']['required'] | default([])) -}}
+                                {%- endif -%}
+                                }
+                            {%- elif item_key == 'required' -%}
+                                required:[
+                                {%- for req_item in item_value -%}
+                                    <|"|>{{- req_item -}}<|"|>
+                                    {%- if not loop.last %},{% endif -%}
+                                {%- endfor -%}
+                                ]
+                            {%- elif item_key == 'type' -%}
+                                {%- if item_value is string -%}
+                                    type:{{ format_argument(item_value | upper) }}
+                                {%- else -%}
+                                    type:{{ format_argument(item_value | map('upper') | list) }}
+                                {%- endif -%}
+                            {%- else -%}
+                                {{ item_key }}:{{ format_argument(item_value) }}
+                            {%- endif -%}
+                        {%- endif -%}
+                    {%- endfor -%}
+                    }
+                {%- endif -%}
+            {%- endif -%}
+            {%- if value['nullable'] %}
+                {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
+                nullable:true
+            {%- endif -%}
+            {%- if value['type'] | upper == 'OBJECT' -%}
+                {%- if value['properties'] is defined and value['properties'] is mapping -%}
+                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
+                    properties:{
+                    {{- format_parameters(value['properties'], value['required'] | default([])) -}}
+                    }
+                {%- elif value is mapping -%}
+                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
+                    properties:{
+                    {{- format_parameters(value, value['required'] | default([]), filter_keys=true) -}}
+                    }
+                {%- endif -%}
+                {%- if value['required'] -%}
+                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
+                    required:[
+                    {%- for item in value['required'] | default([]) -%}
+                        <|"|>{{- item -}}<|"|>
+                        {%- if not loop.last %},{% endif -%}
+                    {%- endfor -%}
+                    ]
+                {%- endif -%}
+            {%- endif -%}
+            {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
+            type:<|"|>{{ value['type'] | upper }}<|"|>}
+        {%- endif -%}
+    {%- endfor -%}
+{%- endmacro -%}
+{%- macro format_function_declaration(tool_data) -%}
+    declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
+    {%- set params = tool_data['function']['parameters'] -%}
+    {%- if params -%}
+        ,parameters:{
+        {%- if params['properties'] -%}
+            properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
+        {%- endif -%}
+        {%- if params['required'] -%}
+            required:[
+            {%- for item in params['required'] -%}
+                <|"|>{{- item -}}<|"|>
+                {{- ',' if not loop.last -}}
+            {%- endfor -%}
+            ],
+        {%- endif -%}
+        {%- if params['type'] -%}
+            type:<|"|>{{- params['type'] | upper -}}<|"|>}
+        {%- endif -%}
+    {%- endif -%}
+    {%- if 'response' in tool_data['function'] -%}
+        {%- set response_declaration = tool_data['function']['response'] -%}
+        ,response:{
+        {%- if response_declaration['description'] -%}
+            description:<|"|>{{- response_declaration['description'] -}}<|"|>,
+        {%- endif -%}
+        {%- if response_declaration['type'] | upper == 'OBJECT' -%}
+            type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
+        {%- endif -%}
+    {%- endif -%}
+    }
+{%- endmacro -%}
+{%- macro format_argument(argument, escape_keys=True) -%}
+    {%- if argument is string -%}
+        {{- '<|"|>' + argument + '<|"|>' -}}
+    {%- elif argument is boolean -%}
+        {{- 'true' if argument else 'false' -}}
+    {%- elif argument is mapping -%}
+        {{- '{' -}}
+        {%- set ns = namespace(found_first=false) -%}
+        {%- for key, value in argument | dictsort -%}
+            {%- if ns.found_first %},{% endif -%}
+            {%- set ns.found_first = true -%}
+            {%- if escape_keys -%}
+                {{- '<|"|>' + key + '<|"|>' -}}
+            {%- else -%}
+                {{- key -}}
+            {%- endif -%}
+            :{{- format_argument(value, escape_keys=escape_keys) -}}
+        {%- endfor -%}
+        {{- '}' -}}
+    {%- elif argument is sequence -%}
+        {{- '[' -}}
+        {%- for item in argument -%}
+            {{- format_argument(item, escape_keys=escape_keys) -}}
+            {%- if not loop.last %},{% endif -%}
+        {%- endfor -%}
+        {{- ']' -}}
+    {%- else -%}
+        {{- argument -}}
+    {%- endif -%}
+{%- endmacro -%}
+{%- macro strip_thinking(text) -%}
+    {%- set ns = namespace(result='') -%}
+    {%- for part in text.split('<channel|>') -%}
+        {%- if '<|channel>' in part -%}
+            {%- set ns.result = ns.result + part.split('<|channel>')[0] -%}
+        {%- else -%}
+            {%- set ns.result = ns.result + part -%}
+        {%- endif -%}
+    {%- endfor -%}
+    {{- ns.result | trim -}}
+{%- endmacro -%}
+{%- macro format_tool_response_block(tool_name, response) -%}
+    {{- '<|tool_response>' -}}
+    {%- if response is mapping -%}
+        {{- 'response:' + tool_name + '{' -}}
+        {%- for key, value in response | dictsort -%}
+            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
+            {%- if not loop.last %},{% endif -%}
+        {%- endfor -%}
+        {{- '}' -}}
+    {%- else -%}
+        {{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}}
+    {%- endif -%}
+    {{- '<tool_response|>' -}}
+{%- endmacro -%}
+{%- set ns = namespace(prev_message_type=None) -%}
+{%- set loop_messages = messages -%}
+{{- bos_token -}}
+{#- Handle System/Tool Definitions Block -#}
+{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
+    {{- '<|turn>system\n' -}}
+    {#- Inject Thinking token at the very top of the FIRST system turn -#}
+    {%- if enable_thinking is defined and enable_thinking -%}
+        {{- '<|think|>\n' -}}
+        {%- set ns.prev_message_type = 'think' -%}
+    {%- endif -%}
+    {%- if messages[0]['role'] in ['system', 'developer'] -%}
+        {%- if messages[0]['content'] is string -%}
+            {{- messages[0]['content'] | trim -}}
+        {%- elif messages[0]['content'] is sequence -%}
+            {%- for item in messages[0]['content'] -%}
+                {{- item['text'] | trim + ' '-}}
+            {%- endfor -%}
+        {%- endif -%}
+        {%- set loop_messages = messages[1:] -%}
+    {%- endif -%}
+    {%- if tools -%}
+        {%- for tool in tools %}
+            {{- '<|tool>' -}}
+            {{- format_function_declaration(tool) | trim -}}
+            {{- '<tool|>' -}}
+        {%- endfor %}
+        {%- set ns.prev_message_type = 'tool' -%}
+    {%- endif -%}
+    {{- '<turn|>\n' -}}
+{%- endif %}
+{#- Pre-scan: find last user message index for reasoning guard -#}
+{%- set ns_turn = namespace(last_user_idx=-1) -%}
+{%- for i in range(loop_messages | length) -%}
+    {%- if loop_messages[i]['role'] == 'user' -%}
+        {%- set ns_turn.last_user_idx = i -%}
+    {%- endif -%}
+{%- endfor -%}
+{#- Loop through messages -#}
+{%- for message in loop_messages -%}
+    {%- if message['role'] != 'tool' -%}
+    {%- set ns.prev_message_type = None -%}
+    {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
+    {#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#}
+    {%- set prev_nt = namespace(role=None, found=false) -%}
+    {%- if loop.index0 > 0 -%}
+        {%- for j in range(loop.index0 - 1, -1, -1) -%}
+            {%- if not prev_nt.found -%}
+                {%- if loop_messages[j]['role'] != 'tool' -%}
+                    {%- set prev_nt.role = loop_messages[j]['role'] -%}
+                    {%- set prev_nt.found = true -%}
+                {%- endif -%}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- endif -%}
+    {%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%}
+    {%- if not continue_same_model_turn -%}
+        {{- '<|turn>' + role + '\n' }}
+    {%- endif -%}
+    {#- Render reasoning/reasoning_content as thinking channel -#}
+    {%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%}
+    {%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}
+        {{- '<|channel>thought\n' + thinking_text + '\n<channel|>' -}}
+    {%- endif -%}
+            {%- if message['tool_calls'] -%}
+                {%- for tool_call in message['tool_calls'] -%}
+                    {%- set function = tool_call['function'] -%}
+                    {{- '<|tool_call>call:' + function['name'] + '{' -}}
+                    {%- if function['arguments'] is mapping -%}
+                        {%- set ns_args = namespace(found_first=false) -%}
+                        {%- for key, value in function['arguments'] | dictsort -%}
+                            {%- if ns_args.found_first %},{% endif -%}
+                            {%- set ns_args.found_first = true -%}
+                            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
+                        {%- endfor -%}
+                    {%- elif function['arguments'] is string -%}
+                        {{- function['arguments'] -}}
+                    {%- endif -%}
+                    {{- '}<tool_call|>' -}}
+                {%- endfor -%}
+                {%- set ns.prev_message_type = 'tool_call' -%}
+            {%- endif -%}
+            {%- set ns_tr_out = namespace(flag=false) -%}
+            {%- if message.get('tool_responses') -%}
+                {#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
+                {%- for tool_response in message['tool_responses'] -%}
+                    {{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}}
+                    {%- set ns_tr_out.flag = true -%}
+                    {%- set ns.prev_message_type = 'tool_response' -%}
+                {%- endfor -%}
+            {%- elif message.get('tool_calls') -%}
+                {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
+                {%- set ns_tool_scan = namespace(stopped=false) -%}
+                {%- for k in range(loop.index0 + 1, loop_messages | length) -%}
+                    {%- if ns_tool_scan.stopped -%}
+                    {%- elif loop_messages[k]['role'] != 'tool' -%}
+                        {%- set ns_tool_scan.stopped = true -%}
+                    {%- else -%}
+                        {%- set follow = loop_messages[k] -%}
+                        {#- Resolve tool_call_id to function name -#}
+                        {%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%}
+                        {%- for tc in message['tool_calls'] -%}
+                            {%- if tc.get('id') == follow.get('tool_call_id') -%}
+                                {%- set ns_tname.name = tc['function']['name'] -%}
+                            {%- endif -%}
+                        {%- endfor -%}
+                        {#- Handle content as string or content-parts array -#}
+                        {%- set tool_body = follow.get('content') -%}
+                        {%- if tool_body is string -%}
+                            {{- format_tool_response_block(ns_tname.name, tool_body) -}}
+                        {%- elif tool_body is sequence and tool_body is not string -%}
+                            {%- set ns_txt = namespace(s='') -%}
+                            {%- for part in tool_body -%}
+                                {%- if part.get('type') == 'text' -%}
+                                    {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
+                                {%- endif -%}
+                            {%- endfor -%}
+                            {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
+                        {%- else -%}
+                            {{- format_tool_response_block(ns_tname.name, tool_body) -}}
+                        {%- endif -%}
+                        {%- set ns_tr_out.flag = true -%}
+                        {%- set ns.prev_message_type = 'tool_response' -%}
+                    {%- endif -%}
+                {%- endfor -%}
+            {%- endif -%}
+            {%- set captured_content -%}
+            {%- if message['content'] is string -%}
+                {%- if role == 'model' -%}
+                    {{- strip_thinking(message['content']) -}}
+                {%- else -%}
+                    {{- message['content'] | trim -}}
+                {%- endif -%}
+            {%- elif message['content'] is sequence -%}
+                {%- for item in message['content'] -%}
+                    {%- if item['type'] == 'text' -%}
+                        {%- if role == 'model' -%}
+                            {{- strip_thinking(item['text']) -}}
+                        {%- else -%}
+                            {{- item['text'] | trim -}}
+                        {%- endif -%}
+                    {%- elif item['type'] == 'image' -%}
+                        {{- '<|image|>' -}}
+                        {%- set ns.prev_message_type = 'image' -%}
+                    {%- elif item['type'] == 'audio' -%}
+                        {{- '<|audio|>' -}}
+                        {%- set ns.prev_message_type = 'audio' -%}
+                    {%- elif item['type'] == 'video' -%}
+                        {{- '<|video|>' -}}
+                        {%- set ns.prev_message_type = 'video' -%}
+                    {%- endif -%}
+                {%- endfor -%}
+            {%- endif -%}
+            {%- endset -%}
+            {{- captured_content -}}
+            {%- set has_content = captured_content | trim | length > 0 -%}
+        {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
+            {{- '<|tool_response>' -}}
+        {%- elif not (ns_tr_out.flag and not has_content) -%}
+            {{- '<turn|>\n' -}}
+        {%- endif -%}
+    {%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
+        {{- '<|turn>model\n' -}}
+    {%- endif -%}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,96 @@

+{
+  "architectures": [
+    "Gemma4ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attention_k_eq_v": false,
+  "bos_token_id": 2,
+  "dtype": "bfloat16",
+  "enable_moe_block": false,
+  "eos_token_id": 1,
+  "expert_intermediate_size": null,
+  "final_logit_softcapping": 30.0,
+  "global_head_dim": 512,
+  "head_dim": 256,
+  "hidden_activation": "gelu_pytorch_tanh",
+  "hidden_size": 2560,
+  "hidden_size_per_layer_input": 256,
+  "initializer_range": 0.02,
+  "intermediate_size": 10240,
+  "layer_types": [
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 131072,
+  "model_type": "gemma4_text",
+  "moe_intermediate_size": null,
+  "num_attention_heads": 8,
+  "num_experts": null,
+  "num_global_key_value_heads": null,
+  "num_hidden_layers": 42,
+  "num_key_value_heads": 2,
+  "num_kv_shared_layers": 18,
+  "pad_token_id": 0,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "full_attention": {
+      "partial_rotary_factor": 0.25,
+      "rope_theta": 1000000.0,
+      "rope_type": "proportional"
+    },
+    "sliding_attention": {
+      "rope_theta": 10000.0,
+      "rope_type": "default"
+    }
+  },
+  "sliding_window": 512,
+  "tie_word_embeddings": true,
+  "top_k_experts": null,
+  "transformers_version": "5.8.0",
+  "use_bidirectional_attention": null,
+  "use_cache": true,
+  "use_double_wide_mlp": false,
+  "vocab_size": 262144,
+  "vocab_size_per_layer_input": 262144
+}

gemma4-e4b-classifier-Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88b84866558e4262bdd535367a0ebedfacfb77b617191399613880698ff90d54
+size 5302272096

gemma4-e4b-classifier-Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fedf71c9d08adf69f59616899ddf61db9a7bbcba3726a45a2ba6e0c8660575fc
+size 7972724832

generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 2,
+  "eos_token_id": 1,
+  "output_attentions": false,
+  "output_hidden_states": false,
+  "pad_token_id": 0,
+  "transformers_version": "5.8.0",
+  "use_cache": true
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7c3265378bccff5205ada572db8100580e490a54b420e80c9aecc559f4e1db5
+size 14926105012

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f
+size 32169626

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,96 @@

+{
+  "audio_token": "<|audio|>",
+  "backend": "tokenizers",
+  "boa_token": "<|audio>",
+  "boi_token": "<|image>",
+  "bos_token": "<bos>",
+  "eoa_token": "<audio|>",
+  "eoc_token": "<channel|>",
+  "eoi_token": "<image|>",
+  "eos_token": "<eos>",
+  "eot_token": "<turn|>",
+  "escape_token": "<|\"|>",
+  "etc_token": "<tool_call|>",
+  "etd_token": "<tool|>",
+  "etr_token": "<tool_response|>",
+  "extra_special_tokens": [
+    "<|video|>"
+  ],
+  "image_token": "<|image|>",
+  "is_local": true,
+  "local_files_only": false,
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "model_specific_special_tokens": {
+    "audio_token": "<|audio|>",
+    "boa_token": "<|audio>",
+    "boi_token": "<|image>",
+    "eoa_token": "<audio|>",
+    "eoc_token": "<channel|>",
+    "eoi_token": "<image|>",
+    "eot_token": "<turn|>",
+    "escape_token": "<|\"|>",
+    "etc_token": "<tool_call|>",
+    "etd_token": "<tool|>",
+    "etr_token": "<tool_response|>",
+    "image_token": "<|image|>",
+    "soc_token": "<|channel>",
+    "sot_token": "<|turn>",
+    "stc_token": "<|tool_call>",
+    "std_token": "<|tool>",
+    "str_token": "<|tool_response>",
+    "think_token": "<|think|>"
+  },
+  "pad_token": "<pad>",
+  "padding_side": "left",
+  "processor_class": "Gemma4Processor",
+  "response_schema": {
+    "properties": {
+      "content": {
+        "type": "string"
+      },
+      "role": {
+        "const": "assistant"
+      },
+      "thinking": {
+        "type": "string"
+      },
+      "tool_calls": {
+        "items": {
+          "properties": {
+            "function": {
+              "properties": {
+                "arguments": {
+                  "additionalProperties": {},
+                  "type": "object",
+                  "x-parser": "gemma4-tool-call"
+                },
+                "name": {
+                  "type": "string"
+                }
+              },
+              "type": "object",
+              "x-regex": "call\\:(?P<name>\\w+)(?P<arguments>\\{.*\\})"
+            },
+            "type": {
+              "const": "function"
+            }
+          },
+          "type": "object"
+        },
+        "type": "array",
+        "x-regex-iterator": "<\\|tool_call>(.*?)<tool_call\\|>"
+      }
+    },
+    "type": "object",
+    "x-regex": "(\\<\\|channel\\>thought\\n(?P<thinking>.*?)\\<channel\\|\\>)?(?P<tool_calls>\\<\\|tool_call\\>.*\\<tool_call\\|\\>)?(?P<content>(?:(?!\\<turn\\|\\>)(?!\\<\\|tool_response\\>).)+)?(?:\\<turn\\|\\>|\\<\\|tool_response\\>)?"
+  },
+  "soc_token": "<|channel>",
+  "sot_token": "<|turn>",
+  "stc_token": "<|tool_call>",
+  "std_token": "<|tool>",
+  "str_token": "<|tool_response>",
+  "think_token": "<|think|>",
+  "tokenizer_class": "GemmaTokenizer",
+  "unk_token": "<unk>"
+}