Gemma MCA Agent - GGUF Quantized
Quantized versions of the Gemma 3 1B MCA SMS agent for fast CPU inference.
Models
| File | Size | Quantization | Notes |
|---|---|---|---|
| gemma-mca-agent-Q4_K_M.gguf | 769MB | Q4_K_M | Recommended - best quality/size balance |
| gemma-mca-agent-Q5_K_M.gguf | 812MB | Q5_K_M | Higher quality, slightly larger |
Usage with llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="./gemma-mca-agent-Q4_K_M.gguf",
n_ctx=2048,
n_threads=4,
)
prompt = "<start_of_turn>user\nI'm interested in funding<end_of_turn>\n<start_of_turn>model\n"
output = llm(prompt, max_tokens=256, temperature=0.7)
print(output["choices"][0]["text"])
Base Model
Fine-tuned from google/gemma-3-1b-it using LoRA on 3,300+ SMS conversation examples.
Original adapter: moe2382/gemma-mca-agent
- Downloads last month
- 49
Hardware compatibility
Log In
to add your hardware
4-bit
5-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support