Gemma MCA Agent - GGUF Quantized

Quantized versions of the Gemma 3 1B MCA SMS agent for fast CPU inference.

Models

File	Size	Quantization	Notes
gemma-mca-agent-Q4_K_M.gguf	769MB	Q4_K_M	Recommended - best quality/size balance
gemma-mca-agent-Q5_K_M.gguf	812MB	Q5_K_M	Higher quality, slightly larger

Usage with llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./gemma-mca-agent-Q4_K_M.gguf",
    n_ctx=2048,
    n_threads=4,
)

prompt = "<start_of_turn>user\nI'm interested in funding<end_of_turn>\n<start_of_turn>model\n"
output = llm(prompt, max_tokens=256, temperature=0.7)
print(output["choices"][0]["text"])

Base Model

Fine-tuned from google/gemma-3-1b-it using LoRA on 3,300+ SMS conversation examples.

Original adapter: moe2382/gemma-mca-agent

Downloads last month: 21

GGUF

Model size

1.0B params

Architecture

gemma3

Hardware compatibility

4-bit

5-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for moe2382/gemma-mca-agent-gguf

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Quantized

(184)

this model