gemma4-e4b-openclaw-agent-gguf

This repository contains the merged GGUF version of the model, optimized for efficient inference on CPU and GPU using llama.cpp.

Model Description

This is a GGUF format model specifically designed to run efficiently via llama-cpp-python and other compatible loaders. It contains the merged weights for local, low-resource deployment.

Usage with llama-cpp-python

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="merged_model.gguf",
    n_ctx=2048, # Context window
    n_gpu_layers=0 # Increase this to offload layers to GPU
)

# Generate completion
output = llm(
    prompt="### Human: Hello!\n### Assistant:",
    max_tokens=256,
    stop=["### Human:"],
    temperature=0.7
)
print(output["choices"][0]["text"])
Downloads last month
678
GGUF
Model size
7B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support