Text Generation
Transformers
Safetensors
GGUF
English
multilingual
gemma4_text
gemma
gemma-4
classification
text-only
vram-optimized
ollama
conversational
Instructions to use igorls/gemma4-e4b-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use igorls/gemma4-e4b-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="igorls/gemma4-e4b-classifier") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("igorls/gemma4-e4b-classifier") model = AutoModelForCausalLM.from_pretrained("igorls/gemma4-e4b-classifier") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use igorls/gemma4-e4b-classifier with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="igorls/gemma4-e4b-classifier", filename="gemma4-e4b-classifier-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use igorls/gemma4-e4b-classifier with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M # Run inference directly in the terminal: llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M # Run inference directly in the terminal: llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf igorls/gemma4-e4b-classifier:Q4_K_M
Use Docker
docker model run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use igorls/gemma4-e4b-classifier with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "igorls/gemma4-e4b-classifier" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "igorls/gemma4-e4b-classifier", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M
- SGLang
How to use igorls/gemma4-e4b-classifier with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "igorls/gemma4-e4b-classifier" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "igorls/gemma4-e4b-classifier", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "igorls/gemma4-e4b-classifier" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "igorls/gemma4-e4b-classifier", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use igorls/gemma4-e4b-classifier with Ollama:
ollama run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M
- Unsloth Studio new
How to use igorls/gemma4-e4b-classifier with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for igorls/gemma4-e4b-classifier to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for igorls/gemma4-e4b-classifier to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for igorls/gemma4-e4b-classifier to start chatting
- Pi new
How to use igorls/gemma4-e4b-classifier with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "igorls/gemma4-e4b-classifier:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use igorls/gemma4-e4b-classifier with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf igorls/gemma4-e4b-classifier:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default igorls/gemma4-e4b-classifier:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use igorls/gemma4-e4b-classifier with Docker Model Runner:
docker model run hf.co/igorls/gemma4-e4b-classifier:Q4_K_M
- Lemonade
How to use igorls/gemma4-e4b-classifier with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull igorls/gemma4-e4b-classifier:Q4_K_M
Run and chat with the model
lemonade run user.gemma4-e4b-classifier-Q4_K_M
List all available models
lemonade list
Upload folder using huggingface_hub
Browse files- .gitattributes +3 -0
- README.md +148 -0
- chat_template.jinja +351 -0
- config.json +96 -0
- gemma4-e4b-classifier-Q4_K_M.gguf +3 -0
- gemma4-e4b-classifier-Q8_0.gguf +3 -0
- generation_config.json +10 -0
- model.safetensors +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +96 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
gemma4-e4b-classifier-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
gemma4-e4b-classifier-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,148 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gemma
|
| 3 |
+
base_model: google/gemma-4-E4B-it
|
| 4 |
+
tags:
|
| 5 |
+
- gemma
|
| 6 |
+
- gemma-4
|
| 7 |
+
- classification
|
| 8 |
+
- text-only
|
| 9 |
+
- vram-optimized
|
| 10 |
+
- ollama
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
- multilingual
|
| 14 |
+
library_name: transformers
|
| 15 |
+
pipeline_tag: text-generation
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# Gemma 4 E4B Classifier (vision/audio-stripped)
|
| 19 |
+
|
| 20 |
+
A modality-stripped variant of [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) for **text-only classification, entity extraction, and structured-memory extraction**. The vision encoder (~150M params) and audio encoder (~300M params) are removed; the text path is unchanged.
|
| 21 |
+
|
| 22 |
+
**Headline:** Same instruction-tuned text behavior as the official Gemma 4 E4B-it, but at **6.5 GB resident VRAM instead of 10.6 GB** (Ollama Q4_K_M, RTX 3090, Linux). All safety alignment is preserved — this is **not** an abliterated or uncensored variant.
|
| 23 |
+
|
| 24 |
+
## Why this exists
|
| 25 |
+
|
| 26 |
+
Gemma 4 E4B is the local leader on small-model classification tasks (room classification, entity/memory extraction). It locks out users with 12 GB GPUs because the official Q4_K_M is 10.6 GB resident — the vision + audio encoders sit in VRAM whether you use them or not. For text-only workloads, those modality encoders are dead weight.
|
| 27 |
+
|
| 28 |
+
This variant strips them via clean re-instantiation: load the multimodal checkpoint, copy text-path tensors into a fresh `Gemma4ForCausalLM(text_config)`, save. No safety-alignment changes. No retraining. No surgery on safetensors files.
|
| 29 |
+
|
| 30 |
+
## How it compares
|
| 31 |
+
|
| 32 |
+
Measured on RTX 3090, Ollama 0.x, against the MemPalace small-model benchmark harness (n=100 per task):
|
| 33 |
+
|
| 34 |
+
| Task | Official `gemma4:e4b-it-q4_K_M` | This model (Q4_K_M) | Δ |
|
| 35 |
+
|---|---:|---:|---:|
|
| 36 |
+
| Calibration | 1.0000 | **1.0000** | 0.0000 |
|
| 37 |
+
| Room classification (closed-set) | 0.6200 | **0.6200** | 0.0000 (exact tie) |
|
| 38 |
+
| Room classification (open-set) | 0.6556 | 0.6526 | -0.0030 |
|
| 39 |
+
| Entity extraction (F1) | 0.7519 | 0.7318 | -0.0201 |
|
| 40 |
+
| Memory coverage | 0.9125 | **0.9375** | +0.0250 (higher) |
|
| 41 |
+
| **VRAM resident** | **10626 MB** | **6517 MB** | **-4109 MB** |
|
| 42 |
+
| e2e p50 (closed-set room) | 230.9 ms | 232.4 ms | +1.5 ms (noise) |
|
| 43 |
+
|
| 44 |
+
All accuracy deltas are within statistical noise at n=100. The 4.1 GB VRAM win is real and reproducible.
|
| 45 |
+
|
| 46 |
+
## What was actually dropped
|
| 47 |
+
|
| 48 |
+
From the 7996.2M-parameter multimodal checkpoint:
|
| 49 |
+
|
| 50 |
+
| Module | Params dropped |
|
| 51 |
+
|---|---:|
|
| 52 |
+
| `model.audio_tower.*` (USM-style conformer) | 304.8M |
|
| 53 |
+
| `model.vision_tower.*` (MobileNet-v5 lineage) | 167.4M |
|
| 54 |
+
| `model.embed_audio.*` (audio→text soft-token projector) | 3.9M |
|
| 55 |
+
| `model.embed_vision.*` (vision→text soft-token projector) | 2.0M |
|
| 56 |
+
| **Total dropped** | **478.1M (6.0%)** |
|
| 57 |
+
| **Total kept** (text path) | **7518.1M (94.0%)** |
|
| 58 |
+
|
| 59 |
+
The VRAM saving (4.1 GB) is significantly larger than the dropped weights account for (~250 MB at Q4_K_M). The remainder comes from: modality encoders kept at higher precision than Q4 inside the GGUF, activation buffers sized for image-token sequences (up to 1120 tokens/image), and the multimodal embedders' vocab-offset tables.
|
| 60 |
+
|
| 61 |
+
## Quantization variants
|
| 62 |
+
|
| 63 |
+
- **`Q4_K_M`** (5.3 GB on disk, 6517 MB resident) — recommended default.
|
| 64 |
+
- **`Q8_0`** (8.0 GB on disk) — precision comparator; minimal accuracy lift on classification.
|
| 65 |
+
- Source safetensors (this repo at bf16, 13.92 GB).
|
| 66 |
+
|
| 67 |
+
## Usage
|
| 68 |
+
|
| 69 |
+
### Hugging Face Transformers
|
| 70 |
+
|
| 71 |
+
```python
|
| 72 |
+
from transformers import AutoTokenizer, Gemma4ForCausalLM
|
| 73 |
+
import torch
|
| 74 |
+
|
| 75 |
+
tok = AutoTokenizer.from_pretrained("igorls/gemma4-e4b-classifier")
|
| 76 |
+
model = Gemma4ForCausalLM.from_pretrained(
|
| 77 |
+
"igorls/gemma4-e4b-classifier",
|
| 78 |
+
torch_dtype=torch.bfloat16,
|
| 79 |
+
device_map="cuda",
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
messages = [{"role": "user", "content": "What is the capital of France? One word."}]
|
| 83 |
+
chat = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 84 |
+
ids = tok(chat, return_tensors="pt").input_ids.to("cuda")
|
| 85 |
+
out = model.generate(ids, max_new_tokens=10, do_sample=False)
|
| 86 |
+
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
### Ollama
|
| 90 |
+
|
| 91 |
+
```bash
|
| 92 |
+
ollama pull igorls/gemma4-e4b-classifier:Q4_K_M
|
| 93 |
+
ollama run igorls/gemma4-e4b-classifier:Q4_K_M "What is the capital of France?"
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
For classification workloads, pass `"think": false` at the top level of the `/api/generate` request to disable Gemma 4's CoT mode (which otherwise consumes the `num_predict` budget):
|
| 97 |
+
|
| 98 |
+
```bash
|
| 99 |
+
curl http://localhost:11434/api/generate -d '{
|
| 100 |
+
"model": "igorls/gemma4-e4b-classifier:Q4_K_M",
|
| 101 |
+
"prompt": "Classify into one word (indoor, outdoor): The kids are playing in the backyard.",
|
| 102 |
+
"think": false,
|
| 103 |
+
"stream": false,
|
| 104 |
+
"options": {"temperature": 0, "num_predict": 16}
|
| 105 |
+
}'
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
## Safety surface
|
| 109 |
+
|
| 110 |
+
This variant is **safety-aligned identically to the official `gemma-4-E4B-it`**. The strip does not touch the text-path weights where alignment lives; it only removes the unused modality encoders.
|
| 111 |
+
|
| 112 |
+
Validated on 18 raw NSFW classification samples (closed-set room, open-set slug invention, entity extraction with named entities, structured memory extraction with decisions/preferences/facts/commitments):
|
| 113 |
+
|
| 114 |
+
- **Zero refusals** on any sample.
|
| 115 |
+
- **JSON validity 100%** on the structured extraction tasks.
|
| 116 |
+
- **Open-set slugs are functional** rather than euphemistic.
|
| 117 |
+
|
| 118 |
+
This confirms the architectural insight from prior research: safety alignment doesn't surface on classification surfaces regardless. There's no reason to ship an uncensored variant for these workloads.
|
| 119 |
+
|
| 120 |
+
## Limitations
|
| 121 |
+
|
| 122 |
+
- **Text-only.** No vision input. No audio input. The encoders are gone. Passing image or audio tokens will produce undefined behavior.
|
| 123 |
+
- **Same context window as base** (128k tokens).
|
| 124 |
+
- **Same tokenizer.** The vocab includes vision/audio special tokens (`<image>`, `<audio>`, etc.) for compatibility with the official tokenizer; these tokens won't activate any modality processing in this variant.
|
| 125 |
+
- **No MTP drafter support on Ollama yet.** The official `google/gemma-4-E4B-it-assistant` MTP drafter works with Transformers and vLLM but not with Ollama on Linux/CUDA as of May 2026 (upstream llama.cpp doesn't recognize the `Gemma4AssistantForCausalLM` architecture). For MTP-accelerated inference, use Transformers or vLLM directly with this model as the target.
|
| 126 |
+
|
| 127 |
+
## License
|
| 128 |
+
|
| 129 |
+
Inherited from the base model: [Gemma Terms of Use](https://ai.google.dev/gemma/terms). By using this model you agree to those terms.
|
| 130 |
+
|
| 131 |
+
## Citation
|
| 132 |
+
|
| 133 |
+
This is a derivative work of Google's Gemma 4 E4B. If you use it, please also credit:
|
| 134 |
+
|
| 135 |
+
```
|
| 136 |
+
@misc{gemma_2025,
|
| 137 |
+
title={Gemma 4 Technical Report},
|
| 138 |
+
author={Google DeepMind},
|
| 139 |
+
year={2026},
|
| 140 |
+
url={https://huggingface.co/google/gemma-4-E4B-it},
|
| 141 |
+
}
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
## Acknowledgments
|
| 145 |
+
|
| 146 |
+
- **Google DeepMind** for Gemma 4 and the open-weight release.
|
| 147 |
+
- The **MemPalace small-model benchmark research** (PR #1447) that surfaced the VRAM gap and motivated this work.
|
| 148 |
+
- The **`igorls/gemma-4-E4B-it-heretic-GGUF`** (author's prior abliteration experiment) for accidentally demonstrating the architectural VRAM win that this artifact reproduces through a clean, safety-aligned path.
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,351 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- macro format_parameters(properties, required, filter_keys=false) -%}
|
| 2 |
+
{%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
|
| 3 |
+
{%- set ns = namespace(found_first=false) -%}
|
| 4 |
+
{%- for key, value in properties | dictsort -%}
|
| 5 |
+
{%- set add_comma = false -%}
|
| 6 |
+
{%- if not filter_keys or key not in standard_keys -%}
|
| 7 |
+
{%- if ns.found_first %},{% endif -%}
|
| 8 |
+
{%- set ns.found_first = true -%}
|
| 9 |
+
{{ key }}:{
|
| 10 |
+
{%- if value['description'] -%}
|
| 11 |
+
description:<|"|>{{ value['description'] }}<|"|>
|
| 12 |
+
{%- set add_comma = true -%}
|
| 13 |
+
{%- endif -%}
|
| 14 |
+
{%- if value['type'] | upper == 'STRING' -%}
|
| 15 |
+
{%- if value['enum'] -%}
|
| 16 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 17 |
+
enum:{{ format_argument(value['enum']) }}
|
| 18 |
+
{%- endif -%}
|
| 19 |
+
{%- elif value['type'] | upper == 'ARRAY' -%}
|
| 20 |
+
{%- if value['items'] is mapping and value['items'] -%}
|
| 21 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 22 |
+
items:{
|
| 23 |
+
{%- set ns_items = namespace(found_first=false) -%}
|
| 24 |
+
{%- for item_key, item_value in value['items'] | dictsort -%}
|
| 25 |
+
{%- if item_value is not none -%}
|
| 26 |
+
{%- if ns_items.found_first %},{% endif -%}
|
| 27 |
+
{%- set ns_items.found_first = true -%}
|
| 28 |
+
{%- if item_key == 'properties' -%}
|
| 29 |
+
properties:{
|
| 30 |
+
{%- if item_value is mapping -%}
|
| 31 |
+
{{- format_parameters(item_value, value['items']['required'] | default([])) -}}
|
| 32 |
+
{%- endif -%}
|
| 33 |
+
}
|
| 34 |
+
{%- elif item_key == 'required' -%}
|
| 35 |
+
required:[
|
| 36 |
+
{%- for req_item in item_value -%}
|
| 37 |
+
<|"|>{{- req_item -}}<|"|>
|
| 38 |
+
{%- if not loop.last %},{% endif -%}
|
| 39 |
+
{%- endfor -%}
|
| 40 |
+
]
|
| 41 |
+
{%- elif item_key == 'type' -%}
|
| 42 |
+
{%- if item_value is string -%}
|
| 43 |
+
type:{{ format_argument(item_value | upper) }}
|
| 44 |
+
{%- else -%}
|
| 45 |
+
type:{{ format_argument(item_value | map('upper') | list) }}
|
| 46 |
+
{%- endif -%}
|
| 47 |
+
{%- else -%}
|
| 48 |
+
{{ item_key }}:{{ format_argument(item_value) }}
|
| 49 |
+
{%- endif -%}
|
| 50 |
+
{%- endif -%}
|
| 51 |
+
{%- endfor -%}
|
| 52 |
+
}
|
| 53 |
+
{%- endif -%}
|
| 54 |
+
{%- endif -%}
|
| 55 |
+
{%- if value['nullable'] %}
|
| 56 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 57 |
+
nullable:true
|
| 58 |
+
{%- endif -%}
|
| 59 |
+
{%- if value['type'] | upper == 'OBJECT' -%}
|
| 60 |
+
{%- if value['properties'] is defined and value['properties'] is mapping -%}
|
| 61 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 62 |
+
properties:{
|
| 63 |
+
{{- format_parameters(value['properties'], value['required'] | default([])) -}}
|
| 64 |
+
}
|
| 65 |
+
{%- elif value is mapping -%}
|
| 66 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 67 |
+
properties:{
|
| 68 |
+
{{- format_parameters(value, value['required'] | default([]), filter_keys=true) -}}
|
| 69 |
+
}
|
| 70 |
+
{%- endif -%}
|
| 71 |
+
{%- if value['required'] -%}
|
| 72 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 73 |
+
required:[
|
| 74 |
+
{%- for item in value['required'] | default([]) -%}
|
| 75 |
+
<|"|>{{- item -}}<|"|>
|
| 76 |
+
{%- if not loop.last %},{% endif -%}
|
| 77 |
+
{%- endfor -%}
|
| 78 |
+
]
|
| 79 |
+
{%- endif -%}
|
| 80 |
+
{%- endif -%}
|
| 81 |
+
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
|
| 82 |
+
type:<|"|>{{ value['type'] | upper }}<|"|>}
|
| 83 |
+
{%- endif -%}
|
| 84 |
+
{%- endfor -%}
|
| 85 |
+
{%- endmacro -%}
|
| 86 |
+
{%- macro format_function_declaration(tool_data) -%}
|
| 87 |
+
declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
|
| 88 |
+
{%- set params = tool_data['function']['parameters'] -%}
|
| 89 |
+
{%- if params -%}
|
| 90 |
+
,parameters:{
|
| 91 |
+
{%- if params['properties'] -%}
|
| 92 |
+
properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
|
| 93 |
+
{%- endif -%}
|
| 94 |
+
{%- if params['required'] -%}
|
| 95 |
+
required:[
|
| 96 |
+
{%- for item in params['required'] -%}
|
| 97 |
+
<|"|>{{- item -}}<|"|>
|
| 98 |
+
{{- ',' if not loop.last -}}
|
| 99 |
+
{%- endfor -%}
|
| 100 |
+
],
|
| 101 |
+
{%- endif -%}
|
| 102 |
+
{%- if params['type'] -%}
|
| 103 |
+
type:<|"|>{{- params['type'] | upper -}}<|"|>}
|
| 104 |
+
{%- endif -%}
|
| 105 |
+
{%- endif -%}
|
| 106 |
+
{%- if 'response' in tool_data['function'] -%}
|
| 107 |
+
{%- set response_declaration = tool_data['function']['response'] -%}
|
| 108 |
+
,response:{
|
| 109 |
+
{%- if response_declaration['description'] -%}
|
| 110 |
+
description:<|"|>{{- response_declaration['description'] -}}<|"|>,
|
| 111 |
+
{%- endif -%}
|
| 112 |
+
{%- if response_declaration['type'] | upper == 'OBJECT' -%}
|
| 113 |
+
type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
|
| 114 |
+
{%- endif -%}
|
| 115 |
+
{%- endif -%}
|
| 116 |
+
}
|
| 117 |
+
{%- endmacro -%}
|
| 118 |
+
{%- macro format_argument(argument, escape_keys=True) -%}
|
| 119 |
+
{%- if argument is string -%}
|
| 120 |
+
{{- '<|"|>' + argument + '<|"|>' -}}
|
| 121 |
+
{%- elif argument is boolean -%}
|
| 122 |
+
{{- 'true' if argument else 'false' -}}
|
| 123 |
+
{%- elif argument is mapping -%}
|
| 124 |
+
{{- '{' -}}
|
| 125 |
+
{%- set ns = namespace(found_first=false) -%}
|
| 126 |
+
{%- for key, value in argument | dictsort -%}
|
| 127 |
+
{%- if ns.found_first %},{% endif -%}
|
| 128 |
+
{%- set ns.found_first = true -%}
|
| 129 |
+
{%- if escape_keys -%}
|
| 130 |
+
{{- '<|"|>' + key + '<|"|>' -}}
|
| 131 |
+
{%- else -%}
|
| 132 |
+
{{- key -}}
|
| 133 |
+
{%- endif -%}
|
| 134 |
+
:{{- format_argument(value, escape_keys=escape_keys) -}}
|
| 135 |
+
{%- endfor -%}
|
| 136 |
+
{{- '}' -}}
|
| 137 |
+
{%- elif argument is sequence -%}
|
| 138 |
+
{{- '[' -}}
|
| 139 |
+
{%- for item in argument -%}
|
| 140 |
+
{{- format_argument(item, escape_keys=escape_keys) -}}
|
| 141 |
+
{%- if not loop.last %},{% endif -%}
|
| 142 |
+
{%- endfor -%}
|
| 143 |
+
{{- ']' -}}
|
| 144 |
+
{%- else -%}
|
| 145 |
+
{{- argument -}}
|
| 146 |
+
{%- endif -%}
|
| 147 |
+
{%- endmacro -%}
|
| 148 |
+
{%- macro strip_thinking(text) -%}
|
| 149 |
+
{%- set ns = namespace(result='') -%}
|
| 150 |
+
{%- for part in text.split('<channel|>') -%}
|
| 151 |
+
{%- if '<|channel>' in part -%}
|
| 152 |
+
{%- set ns.result = ns.result + part.split('<|channel>')[0] -%}
|
| 153 |
+
{%- else -%}
|
| 154 |
+
{%- set ns.result = ns.result + part -%}
|
| 155 |
+
{%- endif -%}
|
| 156 |
+
{%- endfor -%}
|
| 157 |
+
{{- ns.result | trim -}}
|
| 158 |
+
{%- endmacro -%}
|
| 159 |
+
|
| 160 |
+
{%- macro format_tool_response_block(tool_name, response) -%}
|
| 161 |
+
{{- '<|tool_response>' -}}
|
| 162 |
+
{%- if response is mapping -%}
|
| 163 |
+
{{- 'response:' + tool_name + '{' -}}
|
| 164 |
+
{%- for key, value in response | dictsort -%}
|
| 165 |
+
{{- key -}}:{{- format_argument(value, escape_keys=False) -}}
|
| 166 |
+
{%- if not loop.last %},{% endif -%}
|
| 167 |
+
{%- endfor -%}
|
| 168 |
+
{{- '}' -}}
|
| 169 |
+
{%- else -%}
|
| 170 |
+
{{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}}
|
| 171 |
+
{%- endif -%}
|
| 172 |
+
{{- '<tool_response|>' -}}
|
| 173 |
+
{%- endmacro -%}
|
| 174 |
+
|
| 175 |
+
{%- set ns = namespace(prev_message_type=None) -%}
|
| 176 |
+
{%- set loop_messages = messages -%}
|
| 177 |
+
{{- bos_token -}}
|
| 178 |
+
{#- Handle System/Tool Definitions Block -#}
|
| 179 |
+
{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
|
| 180 |
+
{{- '<|turn>system\n' -}}
|
| 181 |
+
{#- Inject Thinking token at the very top of the FIRST system turn -#}
|
| 182 |
+
{%- if enable_thinking is defined and enable_thinking -%}
|
| 183 |
+
{{- '<|think|>\n' -}}
|
| 184 |
+
{%- set ns.prev_message_type = 'think' -%}
|
| 185 |
+
{%- endif -%}
|
| 186 |
+
{%- if messages[0]['role'] in ['system', 'developer'] -%}
|
| 187 |
+
{%- if messages[0]['content'] is string -%}
|
| 188 |
+
{{- messages[0]['content'] | trim -}}
|
| 189 |
+
{%- elif messages[0]['content'] is sequence -%}
|
| 190 |
+
{%- for item in messages[0]['content'] -%}
|
| 191 |
+
{{- item['text'] | trim + ' '-}}
|
| 192 |
+
{%- endfor -%}
|
| 193 |
+
{%- endif -%}
|
| 194 |
+
{%- set loop_messages = messages[1:] -%}
|
| 195 |
+
{%- endif -%}
|
| 196 |
+
{%- if tools -%}
|
| 197 |
+
{%- for tool in tools %}
|
| 198 |
+
{{- '<|tool>' -}}
|
| 199 |
+
{{- format_function_declaration(tool) | trim -}}
|
| 200 |
+
{{- '<tool|>' -}}
|
| 201 |
+
{%- endfor %}
|
| 202 |
+
{%- set ns.prev_message_type = 'tool' -%}
|
| 203 |
+
{%- endif -%}
|
| 204 |
+
{{- '<turn|>\n' -}}
|
| 205 |
+
{%- endif %}
|
| 206 |
+
|
| 207 |
+
{#- Pre-scan: find last user message index for reasoning guard -#}
|
| 208 |
+
{%- set ns_turn = namespace(last_user_idx=-1) -%}
|
| 209 |
+
{%- for i in range(loop_messages | length) -%}
|
| 210 |
+
{%- if loop_messages[i]['role'] == 'user' -%}
|
| 211 |
+
{%- set ns_turn.last_user_idx = i -%}
|
| 212 |
+
{%- endif -%}
|
| 213 |
+
{%- endfor -%}
|
| 214 |
+
|
| 215 |
+
{#- Loop through messages -#}
|
| 216 |
+
{%- for message in loop_messages -%}
|
| 217 |
+
{%- if message['role'] != 'tool' -%}
|
| 218 |
+
{%- set ns.prev_message_type = None -%}
|
| 219 |
+
{%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
|
| 220 |
+
{#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#}
|
| 221 |
+
{%- set prev_nt = namespace(role=None, found=false) -%}
|
| 222 |
+
{%- if loop.index0 > 0 -%}
|
| 223 |
+
{%- for j in range(loop.index0 - 1, -1, -1) -%}
|
| 224 |
+
{%- if not prev_nt.found -%}
|
| 225 |
+
{%- if loop_messages[j]['role'] != 'tool' -%}
|
| 226 |
+
{%- set prev_nt.role = loop_messages[j]['role'] -%}
|
| 227 |
+
{%- set prev_nt.found = true -%}
|
| 228 |
+
{%- endif -%}
|
| 229 |
+
{%- endif -%}
|
| 230 |
+
{%- endfor -%}
|
| 231 |
+
{%- endif -%}
|
| 232 |
+
{%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%}
|
| 233 |
+
{%- if not continue_same_model_turn -%}
|
| 234 |
+
{{- '<|turn>' + role + '\n' }}
|
| 235 |
+
{%- endif -%}
|
| 236 |
+
|
| 237 |
+
{#- Render reasoning/reasoning_content as thinking channel -#}
|
| 238 |
+
{%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%}
|
| 239 |
+
{%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}
|
| 240 |
+
{{- '<|channel>thought\n' + thinking_text + '\n<channel|>' -}}
|
| 241 |
+
{%- endif -%}
|
| 242 |
+
|
| 243 |
+
{%- if message['tool_calls'] -%}
|
| 244 |
+
{%- for tool_call in message['tool_calls'] -%}
|
| 245 |
+
{%- set function = tool_call['function'] -%}
|
| 246 |
+
{{- '<|tool_call>call:' + function['name'] + '{' -}}
|
| 247 |
+
{%- if function['arguments'] is mapping -%}
|
| 248 |
+
{%- set ns_args = namespace(found_first=false) -%}
|
| 249 |
+
{%- for key, value in function['arguments'] | dictsort -%}
|
| 250 |
+
{%- if ns_args.found_first %},{% endif -%}
|
| 251 |
+
{%- set ns_args.found_first = true -%}
|
| 252 |
+
{{- key -}}:{{- format_argument(value, escape_keys=False) -}}
|
| 253 |
+
{%- endfor -%}
|
| 254 |
+
{%- elif function['arguments'] is string -%}
|
| 255 |
+
{{- function['arguments'] -}}
|
| 256 |
+
{%- endif -%}
|
| 257 |
+
{{- '}<tool_call|>' -}}
|
| 258 |
+
{%- endfor -%}
|
| 259 |
+
{%- set ns.prev_message_type = 'tool_call' -%}
|
| 260 |
+
{%- endif -%}
|
| 261 |
+
|
| 262 |
+
{%- set ns_tr_out = namespace(flag=false) -%}
|
| 263 |
+
{%- if message.get('tool_responses') -%}
|
| 264 |
+
{#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
|
| 265 |
+
{%- for tool_response in message['tool_responses'] -%}
|
| 266 |
+
{{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}}
|
| 267 |
+
{%- set ns_tr_out.flag = true -%}
|
| 268 |
+
{%- set ns.prev_message_type = 'tool_response' -%}
|
| 269 |
+
{%- endfor -%}
|
| 270 |
+
{%- elif message.get('tool_calls') -%}
|
| 271 |
+
{#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
|
| 272 |
+
{%- set ns_tool_scan = namespace(stopped=false) -%}
|
| 273 |
+
{%- for k in range(loop.index0 + 1, loop_messages | length) -%}
|
| 274 |
+
{%- if ns_tool_scan.stopped -%}
|
| 275 |
+
{%- elif loop_messages[k]['role'] != 'tool' -%}
|
| 276 |
+
{%- set ns_tool_scan.stopped = true -%}
|
| 277 |
+
{%- else -%}
|
| 278 |
+
{%- set follow = loop_messages[k] -%}
|
| 279 |
+
{#- Resolve tool_call_id to function name -#}
|
| 280 |
+
{%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%}
|
| 281 |
+
{%- for tc in message['tool_calls'] -%}
|
| 282 |
+
{%- if tc.get('id') == follow.get('tool_call_id') -%}
|
| 283 |
+
{%- set ns_tname.name = tc['function']['name'] -%}
|
| 284 |
+
{%- endif -%}
|
| 285 |
+
{%- endfor -%}
|
| 286 |
+
{#- Handle content as string or content-parts array -#}
|
| 287 |
+
{%- set tool_body = follow.get('content') -%}
|
| 288 |
+
{%- if tool_body is string -%}
|
| 289 |
+
{{- format_tool_response_block(ns_tname.name, tool_body) -}}
|
| 290 |
+
{%- elif tool_body is sequence and tool_body is not string -%}
|
| 291 |
+
{%- set ns_txt = namespace(s='') -%}
|
| 292 |
+
{%- for part in tool_body -%}
|
| 293 |
+
{%- if part.get('type') == 'text' -%}
|
| 294 |
+
{%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
|
| 295 |
+
{%- endif -%}
|
| 296 |
+
{%- endfor -%}
|
| 297 |
+
{{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
|
| 298 |
+
{%- else -%}
|
| 299 |
+
{{- format_tool_response_block(ns_tname.name, tool_body) -}}
|
| 300 |
+
{%- endif -%}
|
| 301 |
+
{%- set ns_tr_out.flag = true -%}
|
| 302 |
+
{%- set ns.prev_message_type = 'tool_response' -%}
|
| 303 |
+
{%- endif -%}
|
| 304 |
+
{%- endfor -%}
|
| 305 |
+
{%- endif -%}
|
| 306 |
+
|
| 307 |
+
{%- set captured_content -%}
|
| 308 |
+
{%- if message['content'] is string -%}
|
| 309 |
+
{%- if role == 'model' -%}
|
| 310 |
+
{{- strip_thinking(message['content']) -}}
|
| 311 |
+
{%- else -%}
|
| 312 |
+
{{- message['content'] | trim -}}
|
| 313 |
+
{%- endif -%}
|
| 314 |
+
{%- elif message['content'] is sequence -%}
|
| 315 |
+
{%- for item in message['content'] -%}
|
| 316 |
+
{%- if item['type'] == 'text' -%}
|
| 317 |
+
{%- if role == 'model' -%}
|
| 318 |
+
{{- strip_thinking(item['text']) -}}
|
| 319 |
+
{%- else -%}
|
| 320 |
+
{{- item['text'] | trim -}}
|
| 321 |
+
{%- endif -%}
|
| 322 |
+
{%- elif item['type'] == 'image' -%}
|
| 323 |
+
{{- '<|image|>' -}}
|
| 324 |
+
{%- set ns.prev_message_type = 'image' -%}
|
| 325 |
+
{%- elif item['type'] == 'audio' -%}
|
| 326 |
+
{{- '<|audio|>' -}}
|
| 327 |
+
{%- set ns.prev_message_type = 'audio' -%}
|
| 328 |
+
{%- elif item['type'] == 'video' -%}
|
| 329 |
+
{{- '<|video|>' -}}
|
| 330 |
+
{%- set ns.prev_message_type = 'video' -%}
|
| 331 |
+
{%- endif -%}
|
| 332 |
+
{%- endfor -%}
|
| 333 |
+
{%- endif -%}
|
| 334 |
+
{%- endset -%}
|
| 335 |
+
|
| 336 |
+
{{- captured_content -}}
|
| 337 |
+
{%- set has_content = captured_content | trim | length > 0 -%}
|
| 338 |
+
|
| 339 |
+
{%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
|
| 340 |
+
{{- '<|tool_response>' -}}
|
| 341 |
+
{%- elif not (ns_tr_out.flag and not has_content) -%}
|
| 342 |
+
{{- '<turn|>\n' -}}
|
| 343 |
+
{%- endif -%}
|
| 344 |
+
{%- endif -%}
|
| 345 |
+
{%- endfor -%}
|
| 346 |
+
|
| 347 |
+
{%- if add_generation_prompt -%}
|
| 348 |
+
{%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
|
| 349 |
+
{{- '<|turn>model\n' -}}
|
| 350 |
+
{%- endif -%}
|
| 351 |
+
{%- endif -%}
|
config.json
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Gemma4ForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"attention_k_eq_v": false,
|
| 8 |
+
"bos_token_id": 2,
|
| 9 |
+
"dtype": "bfloat16",
|
| 10 |
+
"enable_moe_block": false,
|
| 11 |
+
"eos_token_id": 1,
|
| 12 |
+
"expert_intermediate_size": null,
|
| 13 |
+
"final_logit_softcapping": 30.0,
|
| 14 |
+
"global_head_dim": 512,
|
| 15 |
+
"head_dim": 256,
|
| 16 |
+
"hidden_activation": "gelu_pytorch_tanh",
|
| 17 |
+
"hidden_size": 2560,
|
| 18 |
+
"hidden_size_per_layer_input": 256,
|
| 19 |
+
"initializer_range": 0.02,
|
| 20 |
+
"intermediate_size": 10240,
|
| 21 |
+
"layer_types": [
|
| 22 |
+
"sliding_attention",
|
| 23 |
+
"sliding_attention",
|
| 24 |
+
"sliding_attention",
|
| 25 |
+
"sliding_attention",
|
| 26 |
+
"sliding_attention",
|
| 27 |
+
"full_attention",
|
| 28 |
+
"sliding_attention",
|
| 29 |
+
"sliding_attention",
|
| 30 |
+
"sliding_attention",
|
| 31 |
+
"sliding_attention",
|
| 32 |
+
"sliding_attention",
|
| 33 |
+
"full_attention",
|
| 34 |
+
"sliding_attention",
|
| 35 |
+
"sliding_attention",
|
| 36 |
+
"sliding_attention",
|
| 37 |
+
"sliding_attention",
|
| 38 |
+
"sliding_attention",
|
| 39 |
+
"full_attention",
|
| 40 |
+
"sliding_attention",
|
| 41 |
+
"sliding_attention",
|
| 42 |
+
"sliding_attention",
|
| 43 |
+
"sliding_attention",
|
| 44 |
+
"sliding_attention",
|
| 45 |
+
"full_attention",
|
| 46 |
+
"sliding_attention",
|
| 47 |
+
"sliding_attention",
|
| 48 |
+
"sliding_attention",
|
| 49 |
+
"sliding_attention",
|
| 50 |
+
"sliding_attention",
|
| 51 |
+
"full_attention",
|
| 52 |
+
"sliding_attention",
|
| 53 |
+
"sliding_attention",
|
| 54 |
+
"sliding_attention",
|
| 55 |
+
"sliding_attention",
|
| 56 |
+
"sliding_attention",
|
| 57 |
+
"full_attention",
|
| 58 |
+
"sliding_attention",
|
| 59 |
+
"sliding_attention",
|
| 60 |
+
"sliding_attention",
|
| 61 |
+
"sliding_attention",
|
| 62 |
+
"sliding_attention",
|
| 63 |
+
"full_attention"
|
| 64 |
+
],
|
| 65 |
+
"max_position_embeddings": 131072,
|
| 66 |
+
"model_type": "gemma4_text",
|
| 67 |
+
"moe_intermediate_size": null,
|
| 68 |
+
"num_attention_heads": 8,
|
| 69 |
+
"num_experts": null,
|
| 70 |
+
"num_global_key_value_heads": null,
|
| 71 |
+
"num_hidden_layers": 42,
|
| 72 |
+
"num_key_value_heads": 2,
|
| 73 |
+
"num_kv_shared_layers": 18,
|
| 74 |
+
"pad_token_id": 0,
|
| 75 |
+
"rms_norm_eps": 1e-06,
|
| 76 |
+
"rope_parameters": {
|
| 77 |
+
"full_attention": {
|
| 78 |
+
"partial_rotary_factor": 0.25,
|
| 79 |
+
"rope_theta": 1000000.0,
|
| 80 |
+
"rope_type": "proportional"
|
| 81 |
+
},
|
| 82 |
+
"sliding_attention": {
|
| 83 |
+
"rope_theta": 10000.0,
|
| 84 |
+
"rope_type": "default"
|
| 85 |
+
}
|
| 86 |
+
},
|
| 87 |
+
"sliding_window": 512,
|
| 88 |
+
"tie_word_embeddings": true,
|
| 89 |
+
"top_k_experts": null,
|
| 90 |
+
"transformers_version": "5.8.0",
|
| 91 |
+
"use_bidirectional_attention": null,
|
| 92 |
+
"use_cache": true,
|
| 93 |
+
"use_double_wide_mlp": false,
|
| 94 |
+
"vocab_size": 262144,
|
| 95 |
+
"vocab_size_per_layer_input": 262144
|
| 96 |
+
}
|
gemma4-e4b-classifier-Q4_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:88b84866558e4262bdd535367a0ebedfacfb77b617191399613880698ff90d54
|
| 3 |
+
size 5302272096
|
gemma4-e4b-classifier-Q8_0.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fedf71c9d08adf69f59616899ddf61db9a7bbcba3726a45a2ba6e0c8660575fc
|
| 3 |
+
size 7972724832
|
generation_config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"bos_token_id": 2,
|
| 4 |
+
"eos_token_id": 1,
|
| 5 |
+
"output_attentions": false,
|
| 6 |
+
"output_hidden_states": false,
|
| 7 |
+
"pad_token_id": 0,
|
| 8 |
+
"transformers_version": "5.8.0",
|
| 9 |
+
"use_cache": true
|
| 10 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c7c3265378bccff5205ada572db8100580e490a54b420e80c9aecc559f4e1db5
|
| 3 |
+
size 14926105012
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f
|
| 3 |
+
size 32169626
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"audio_token": "<|audio|>",
|
| 3 |
+
"backend": "tokenizers",
|
| 4 |
+
"boa_token": "<|audio>",
|
| 5 |
+
"boi_token": "<|image>",
|
| 6 |
+
"bos_token": "<bos>",
|
| 7 |
+
"eoa_token": "<audio|>",
|
| 8 |
+
"eoc_token": "<channel|>",
|
| 9 |
+
"eoi_token": "<image|>",
|
| 10 |
+
"eos_token": "<eos>",
|
| 11 |
+
"eot_token": "<turn|>",
|
| 12 |
+
"escape_token": "<|\"|>",
|
| 13 |
+
"etc_token": "<tool_call|>",
|
| 14 |
+
"etd_token": "<tool|>",
|
| 15 |
+
"etr_token": "<tool_response|>",
|
| 16 |
+
"extra_special_tokens": [
|
| 17 |
+
"<|video|>"
|
| 18 |
+
],
|
| 19 |
+
"image_token": "<|image|>",
|
| 20 |
+
"is_local": true,
|
| 21 |
+
"local_files_only": false,
|
| 22 |
+
"mask_token": "<mask>",
|
| 23 |
+
"model_max_length": 1000000000000000019884624838656,
|
| 24 |
+
"model_specific_special_tokens": {
|
| 25 |
+
"audio_token": "<|audio|>",
|
| 26 |
+
"boa_token": "<|audio>",
|
| 27 |
+
"boi_token": "<|image>",
|
| 28 |
+
"eoa_token": "<audio|>",
|
| 29 |
+
"eoc_token": "<channel|>",
|
| 30 |
+
"eoi_token": "<image|>",
|
| 31 |
+
"eot_token": "<turn|>",
|
| 32 |
+
"escape_token": "<|\"|>",
|
| 33 |
+
"etc_token": "<tool_call|>",
|
| 34 |
+
"etd_token": "<tool|>",
|
| 35 |
+
"etr_token": "<tool_response|>",
|
| 36 |
+
"image_token": "<|image|>",
|
| 37 |
+
"soc_token": "<|channel>",
|
| 38 |
+
"sot_token": "<|turn>",
|
| 39 |
+
"stc_token": "<|tool_call>",
|
| 40 |
+
"std_token": "<|tool>",
|
| 41 |
+
"str_token": "<|tool_response>",
|
| 42 |
+
"think_token": "<|think|>"
|
| 43 |
+
},
|
| 44 |
+
"pad_token": "<pad>",
|
| 45 |
+
"padding_side": "left",
|
| 46 |
+
"processor_class": "Gemma4Processor",
|
| 47 |
+
"response_schema": {
|
| 48 |
+
"properties": {
|
| 49 |
+
"content": {
|
| 50 |
+
"type": "string"
|
| 51 |
+
},
|
| 52 |
+
"role": {
|
| 53 |
+
"const": "assistant"
|
| 54 |
+
},
|
| 55 |
+
"thinking": {
|
| 56 |
+
"type": "string"
|
| 57 |
+
},
|
| 58 |
+
"tool_calls": {
|
| 59 |
+
"items": {
|
| 60 |
+
"properties": {
|
| 61 |
+
"function": {
|
| 62 |
+
"properties": {
|
| 63 |
+
"arguments": {
|
| 64 |
+
"additionalProperties": {},
|
| 65 |
+
"type": "object",
|
| 66 |
+
"x-parser": "gemma4-tool-call"
|
| 67 |
+
},
|
| 68 |
+
"name": {
|
| 69 |
+
"type": "string"
|
| 70 |
+
}
|
| 71 |
+
},
|
| 72 |
+
"type": "object",
|
| 73 |
+
"x-regex": "call\\:(?P<name>\\w+)(?P<arguments>\\{.*\\})"
|
| 74 |
+
},
|
| 75 |
+
"type": {
|
| 76 |
+
"const": "function"
|
| 77 |
+
}
|
| 78 |
+
},
|
| 79 |
+
"type": "object"
|
| 80 |
+
},
|
| 81 |
+
"type": "array",
|
| 82 |
+
"x-regex-iterator": "<\\|tool_call>(.*?)<tool_call\\|>"
|
| 83 |
+
}
|
| 84 |
+
},
|
| 85 |
+
"type": "object",
|
| 86 |
+
"x-regex": "(\\<\\|channel\\>thought\\n(?P<thinking>.*?)\\<channel\\|\\>)?(?P<tool_calls>\\<\\|tool_call\\>.*\\<tool_call\\|\\>)?(?P<content>(?:(?!\\<turn\\|\\>)(?!\\<\\|tool_response\\>).)+)?(?:\\<turn\\|\\>|\\<\\|tool_response\\>)?"
|
| 87 |
+
},
|
| 88 |
+
"soc_token": "<|channel>",
|
| 89 |
+
"sot_token": "<|turn>",
|
| 90 |
+
"stc_token": "<|tool_call>",
|
| 91 |
+
"std_token": "<|tool>",
|
| 92 |
+
"str_token": "<|tool_response>",
|
| 93 |
+
"think_token": "<|think|>",
|
| 94 |
+
"tokenizer_class": "GemmaTokenizer",
|
| 95 |
+
"unk_token": "<unk>"
|
| 96 |
+
}
|