Gemma MCA Agent - GGUF Quantized

Quantized versions of the Gemma 3 1B MCA SMS agent for fast CPU inference.

Models

File Size Quantization Notes
gemma-mca-agent-Q4_K_M.gguf 769MB Q4_K_M Recommended - best quality/size balance
gemma-mca-agent-Q5_K_M.gguf 812MB Q5_K_M Higher quality, slightly larger

Usage with llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./gemma-mca-agent-Q4_K_M.gguf",
    n_ctx=2048,
    n_threads=4,
)

prompt = "<start_of_turn>user\nI'm interested in funding<end_of_turn>\n<start_of_turn>model\n"
output = llm(prompt, max_tokens=256, temperature=0.7)
print(output["choices"][0]["text"])

Base Model

Fine-tuned from google/gemma-3-1b-it using LoRA on 3,300+ SMS conversation examples.

Original adapter: moe2382/gemma-mca-agent

Downloads last month
49
GGUF
Model size
1.0B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for moe2382/gemma-mca-agent-gguf

Quantized
(159)
this model