Instructions to use roshangrewal/gemma4-e4b-toolcall-v01 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use roshangrewal/gemma4-e4b-toolcall-v01 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="roshangrewal/gemma4-e4b-toolcall-v01")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("roshangrewal/gemma4-e4b-toolcall-v01")
model = AutoModelForMultimodalLM.from_pretrained("roshangrewal/gemma4-e4b-toolcall-v01", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use roshangrewal/gemma4-e4b-toolcall-v01 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "roshangrewal/gemma4-e4b-toolcall-v01"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "roshangrewal/gemma4-e4b-toolcall-v01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/roshangrewal/gemma4-e4b-toolcall-v01

SGLang

How to use roshangrewal/gemma4-e4b-toolcall-v01 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "roshangrewal/gemma4-e4b-toolcall-v01" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "roshangrewal/gemma4-e4b-toolcall-v01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "roshangrewal/gemma4-e4b-toolcall-v01" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "roshangrewal/gemma4-e4b-toolcall-v01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use roshangrewal/gemma4-e4b-toolcall-v01 with Docker Model Runner:
```
docker model run hf.co/roshangrewal/gemma4-e4b-toolcall-v01
```

Gemma 4 E4B — Tool-Calling Fine-Tune (v0.1)

An experimental fine-tune of Google's Gemma 4 E4B for function-calling and tool-use, trained on 174K examples across 10,000 steps on a single Tesla T4 GPU.

🎯 What This Model Does

This model generates structured tool calls in a compact format when given a user query and available tool definitions.

Output format:

call:function_name{param1:value1,param2:value2}

Example:

Input: "What's the weather in Tokyo?"
Output: call:get_weather{city:Tokyo}

📊 Evaluation Results (v0.1)

Evaluated on held-out validation set (200 samples):

Metric	Score
Tool Selection Accuracy	64.2%
Full Match (name + args)	28.4%
No-Call Accuracy (avoids hallucination)	69.9%
Missed Tool Call Rate	35.8%

Strengths:

✅ Learned when NOT to call tools (70% no-call accuracy, low hallucination)
✅ Generates structured tool calls (not free-form text)
✅ Selects correct tool ~64% of the time from multiple options

Known Limitations:

⚠️ Uses compact format (call:name{args}) rather than standard JSON
⚠️ Misses tool calls ~36% of the time (responds with text instead)
⚠️ Argument extraction needs improvement (28% full match)
⚠️ v0.1 — not production-ready, experimental release

🚀 Quick Start

Option 1: Merged Model (recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch, json

# Load merged model (no adapter needed)
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True)
model = AutoModelForCausalLM.from_pretrained("roshangrewal/gemma4-e4b-toolcall-v01", quantization_config=bnb, device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("roshangrewal/gemma4-e4b-toolcall-v01")

# Define tools
tools = [{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}]

messages = [
    {"role": "system", "content": f"You have access to these tools:\n{json.dumps(tools)}\nCall the appropriate function when needed."},
    {"role": "user", "content": "What's the weather in Mumbai?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=200, do_sample=False, pad_token_id=tokenizer.pad_token_id)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# Output: call:get_weather{city:Mumbai}

Option 2: Adapter (lightweight, 51MB)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch, json

# Load base + adapter
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it", quantization_config=bnb, device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")
model = PeftModel.from_pretrained(base, "roshangrewal/gemma4-e4b-toolcall-v01-lora")
model.eval()

Deployment Options

Method	VRAM Required	Speed
4-bit quantized (above)	~10 GB	Good for T4/4090
fp16 (full precision)	~16 GB	Best quality, needs A10+
GGUF via llama.cpp/Ollama	~6 GB	CPU + GPU hybrid

💬 Prompt Examples

Single tool, simple query

tools = [
    {"type": "function", "function": {"name": "get_weather", "description": "Get current weather",
     "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
]
messages = [
    {"role": "system", "content": f"You have access to these tools:\n{json.dumps(tools)}\nCall the appropriate function when needed. When no tool is needed, respond directly."},
    {"role": "user", "content": "What's the weather in Tokyo?"}
]
# Output: call:get_weather{city:Tokyo}

Multiple tools — model selects the right one

tools = [
    {"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city",
     "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}},
    {"type": "function", "function": {"name": "search_web", "description": "Search the web",
     "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}},
    {"type": "function", "function": {"name": "send_email", "description": "Send an email",
     "parameters": {"type": "object", "properties": {"to": {"type": "string"}, "subject": {"type": "string"}, "body": {"type": "string"}}, "required": ["to", "subject", "body"]}}}
]
messages = [
    {"role": "system", "content": f"You have access to these tools:\n{json.dumps(tools)}\nCall the appropriate function when needed. When no tool is needed, respond directly."},
    {"role": "user", "content": "Search for latest news about AI startups in India"}
]
# Output: call:search_web{query:latest news AI startups India}

No tool needed — model responds directly

messages = [
    {"role": "system", "content": f"You have access to these tools:\n{json.dumps(tools)}\nCall the appropriate function when needed. When no tool is needed, respond directly."},
    {"role": "user", "content": "What is 2 + 2?"}
]
# Output: 4 (no tool call generated)

Prompt structure (for custom integrations)

System: You have access to these tools:
[tool definitions as JSON array]
Call the appropriate function when needed. When no tool is needed, respond directly.

User: <query>

| Parameter | Value |
|-----------|-------|
| Base Model | google/gemma-4-E4B-it (8B params, 4.5B effective) |
| Method | QLoRA (4-bit NF4, double quantization) |
| LoRA Rank | 16 |
| LoRA Alpha | 16 |
| Target Modules | q_proj.linear, k_proj.linear, v_proj.linear, o_proj.linear, up_proj.linear, down_proj.linear |
| Learning Rate | 1e-4 (linear decay, 10% warmup) |
| Effective Batch Size | 16 |
| Max Length | 1024 |
| Steps | 10,000 (~84% of 1 epoch) |
| Training Time | 56 hours |
| GPU | NVIDIA Tesla T4 (16GB) |
| Cost | ~$0 (own hardware) |

## 📚 Training Data

174,853 function-calling examples from:

| Dataset | Examples |
|---------|----------|
| [Glaive Function Calling v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 112,960 |
| [Salesforce xLAM 60K](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | 60,000 |
| [Hermes Function Calling v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | 1,893 |

## 🗺️ Roadmap

- **v0.1** (current): Initial fine-tune, compact call format
- **v0.2** (planned): Align with Gemma 4's native tool-calling template, target 80%+ accuracy
- **v1.0** (planned): Production-ready with BFCL leaderboard submission

## 💡 Parsing the Output

```python
import re

def parse_tool_call(text):
    m = re.findall(r'call:(\w+)\{(.+?)\}', text)
    if m:
        name = m[0][0]
        args = dict(re.findall(r'(\w+):([^,}]+)', m[0][1]))
        return {"name": name, "arguments": args}
    return None

result = parse_tool_call("call:get_weather{city:Tokyo}")
# {'name': 'get_weather', 'arguments': {'city': 'Tokyo'}}

🔗 Links

Training Code: github.com/roshangrewal/f-for-finetuning
Base Model: google/gemma-4-E4B-it

📝 Citation

@misc{grewal2026gemma4toolcall,
  title={Gemma 4 E4B Tool-Calling Fine-Tune v0.1},
  author={Roshan Grewal},
  year={2026},
  url={https://huggingface.co/roshangrewal/gemma4-e4b-toolcall-v01}
}


**Tips:**
- Always include tool definitions in the system message as a JSON array
- The system message must contain the instruction "Call the appropriate function when needed"
- Model outputs `call:function_name{param:value}` format when it decides to use a tool
- Model responds with plain text when no tool is appropriate

## 🏗️ Training Details

| Parameter | Value |
|-----------|-------|
| Base Model | google/gemma-4-E4B-it (8B params, 4.5B effective) |
| Method | QLoRA (4-bit NF4, double quantization) |
| LoRA Rank | 16 |
| LoRA Alpha | 16 |
| Target Modules | q_proj.linear, k_proj.linear, v_proj.linear, o_proj.linear, up_proj.linear, down_proj.linear |
| Learning Rate | 1e-4 (linear decay, 10% warmup) |
| Effective Batch Size | 16 |
| Max Length | 1024 |
| Steps | 10,000 (~84% of 1 epoch) |
| Training Time | 56 hours |
| GPU | NVIDIA Tesla T4 (16GB) |

## 📚 Training Data

174,853 function-calling examples from:

| Dataset | Examples |
|---------|----------|
| [Glaive Function Calling v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 112,960 |
| [Salesforce xLAM 60K](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | 60,000 |
| [Hermes Function Calling v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | 1,893 |

## 🗺️ Roadmap

- **v0.1** (current): Initial fine-tune, compact call format
- **v0.2** (planned): Align with Gemma 4's native tool-calling template, target 80%+ accuracy
- **v1.0** (planned): Production-ready with BFCL leaderboard submission

## 💡 Parsing the Output

```python
import re

def parse_tool_call(text):
    m = re.findall(r'call:(\w+)\{(.+?)\}', text)
    if m:
        name = m[0][0]
        args = dict(re.findall(r'(\w+):([^,}]+)', m[0][1]))
        return {"name": name, "arguments": args}
    return None

result = parse_tool_call("call:get_weather{city:Tokyo}")
# {'name': 'get_weather', 'arguments': {'city': 'Tokyo'}}

🔗 Links

Training Code: github.com/roshangrewal/f-for-finetuning
Base Model: google/gemma-4-E4B-it

📝 Citation

@misc{grewal2026gemma4toolcall,
  title={Gemma 4 E4B Tool-Calling Fine-Tune v0.1},
  author={Roshan Grewal},
  year={2026},
  url={https://huggingface.co/roshangrewal/gemma4-e4b-toolcall-v01}
}

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

F16

Model tree for roshangrewal/gemma4-e4b-toolcall-v01

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

(275)

this model

roshangrewal
/

gemma4-e4b-toolcall-v01

Gemma 4 E4B — Tool-Calling Fine-Tune (v0.1)

🎯 What This Model Does

📊 Evaluation Results (v0.1)

🚀 Quick Start

Option 1: Merged Model (recommended)

Option 2: Adapter (lightweight, 51MB)

Deployment Options

💬 Prompt Examples

Single tool, simple query

Multiple tools — model selects the right one

No tool needed — model responds directly

Prompt structure (for custom integrations)

🔗 Links

📝 Citation

🔗 Links

📝 Citation

Model tree for roshangrewal/gemma4-e4b-toolcall-v01

Datasets used to train roshangrewal/gemma4-e4b-toolcall-v01