Instructions to use gonrocca/mate-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gonrocca/mate-v1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="gonrocca/mate-v1",
	filename="mate-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use gonrocca/mate-v1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gonrocca/mate-v1:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gonrocca/mate-v1:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf gonrocca/mate-v1:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf gonrocca/mate-v1:Q4_K_M

Use Docker

docker model run hf.co/gonrocca/mate-v1:Q4_K_M

LM Studio
Jan

vLLM

How to use gonrocca/mate-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gonrocca/mate-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gonrocca/mate-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/gonrocca/mate-v1:Q4_K_M

Ollama
How to use gonrocca/mate-v1 with Ollama:
```
ollama run hf.co/gonrocca/mate-v1:Q4_K_M
```

Unsloth Studio new

How to use gonrocca/mate-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gonrocca/mate-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gonrocca/mate-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for gonrocca/mate-v1 to start chatting

Pi new

How to use gonrocca/mate-v1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf gonrocca/mate-v1:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "gonrocca/mate-v1:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use gonrocca/mate-v1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf gonrocca/mate-v1:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default gonrocca/mate-v1:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use gonrocca/mate-v1 with Docker Model Runner:
```
docker model run hf.co/gonrocca/mate-v1:Q4_K_M
```

Lemonade

How to use gonrocca/mate-v1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull gonrocca/mate-v1:Q4_K_M

Run and chat with the model

lemonade run user.mate-v1-Q4_K_M

List all available models

lemonade list

Mate

Una AI que vive en tu máquina. Por Gonzalo Rocca — 2026 — San Luis, Argentina

🔗 Web: mate.ceroclawd.com 🔗 Autor: gonzalorocca.com.ar

Por qué existe

Hace más de 10 años escribo software para sistemas que no se pueden caer. Lugares donde un error a las 3 AM cuesta caro y no avisa.

Hoy la AI es parte del trabajo diario. Pero algo me hacía ruido — cada prompt viajaba a un servidor remoto, cada mes una factura distinta, cada release de un proveedor cambiaba el comportamiento del modelo. No era mío. Era prestado.

Quería algo que estuviera ahí cuando lo necesitara. Sin servidores, sin claves de API, sin paywalls. Algo entrenado a mi gusto, con mi forma de razonar, que no me hablara como vendedor.

Lo armé entre noches y fines de semana. Cata tiene 6 y arma mundos enteros en el living antes de cenar. Olivia tiene un mes y medio y casi no pesa. Eso es lo principal. Mate lo armé en los ratos que sobran — noches, fines de semana, lo que viene después de ellas.

El nombre vino último. Buscaba algo que sonara a compañía silenciosa. A algo que está, sin pedirte permiso.

Mate.

Qué hace

🛠️ Tool calling preciso — decide cuándo invocar funciones, genera JSON parseable, encadena resultados multi-turn (compatible OpenAI function-calling spec)
💻 Código limpio — Python, JavaScript, TypeScript, Go, Rust. Refactor, debug, generación
🧠 Razona antes de actuar — bloques <think>...</think> para evaluar qué tool conviene y qué edge cases tiene
🌎 Bilingüe — español argentino + inglés
🏠 100% local — corre en tu GPU, sin telemetría, sin internet, sin API keys

Specs técnicas


Arquitectura	Mixture of Experts · 27B totales / 4B activos
Cuantización	GGUF Q4_K_M (5.32 BPW)
Tamaño en disco	~15.6 GB
VRAM mínima	16 GB (RTX 3090 24 GB cómodo)
Velocidad inference	~50-60 tok/s en RTX 3090
Context window	8192 tokens default (configurable hasta 256K)
Idiomas	Español + Inglés
Capacidades	Code generation · Tool calling (OpenAI spec) · Reasoning
Training method	QLoRA 4-bit + Unsloth
Training hardware	NVIDIA H100 80GB (RunPod)
Training dataset	~10k ejemplos curados

🛠️ Tool calling (OpenAI function-calling spec)

Mate fue entrenado con ~7.500 ejemplos de tool calling (xLAM + Glaive). Sabe:

Decidir cuándo invocar una tool vs responder en texto (incluye ~800 ejemplos negativos para no llamar tools innecesariamente)
Generar JSON estricto con name + arguments correctos según el schema
Encadenar multi-turn: ver el resultado de una tool, decidir si llama otra, sintetizar respuesta final
Razonar antes con bloques <think>...</think> — qué tool conviene y por qué

Compatible con el formato OpenAI function-calling: pasale las definiciones de tools en el system prompt y Mate las usa.

🖥️ Compatible con todos los runtimes principales

Como GGUF estándar, Mate corre nativamente en:

Runtime	Caso de uso
Ollama	API REST + CLI · `ollama run mate`
llama.cpp	Server `llama-server` (compatible OpenAI API) o `llama-cli` interactivo
LM Studio	GUI desktop drag & drop
Jan	Cliente desktop con UI cómoda
KoboldCpp	UI web local
Open WebUI	Frontend web tipo ChatGPT, sobre Ollama
text-generation-webui	UI completa con extensiones
llamafile	Single-binary executable

Cualquier app que soporte GGUF te sirve.

Mix de training

Bucket	%	Fuente
Tool calling single-turn	40%	`Salesforce/xlam-function-calling-60k`
Tool calling multi-turn	15%	`glaiveai/glaive-function-calling-v2`
Reasoning + code (Claude Opus synthetic)	18%	`angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k`
Code generation	10%	`m-a-p/CodeFeedback-Filtered-Instruction`
Negativos (no-tool)	8%	curado del CodeFeedback
Identidad Mate	7%	manual (autor)

Archivos en este repo

Archivo	Tamaño	Para qué
`mate-Q4_K_M.gguf`	~15.6 GB	Modelo cuantizado para inference (Ollama, llama.cpp, LM Studio)
`lora/`	~3.8 GB	LoRA adapter original — para re-trains futuros (continue training, especializaciones)

Cómo usarlo

Con Ollama (recomendado)

# 1. Descargar el GGUF
hf download gonrocca/mate-v1 mate-Q4_K_M.gguf --local-dir ./mate-model

# 2. Crear Modelfile
cat > ./mate-model/Modelfile <<'EOF'
FROM ./mate-Q4_K_M.gguf

TEMPLATE """{{- range $i, $_ := .Messages }}
{{- if eq .Role "system" }}<|turn>system
{{ .Content }}<turn|>
{{ else if eq .Role "user" }}<|turn>user
{{ .Content }}<turn|>
{{ else if eq .Role "assistant" }}<|turn>model
{{ .Content }}<turn|>
{{ end }}
{{- end }}{{- if .Messages }}<|turn>model
{{ end }}"""

PARAMETER stop "<turn|>"
PARAMETER stop "<|turn>"
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
PARAMETER num_gpu 999

SYSTEM """You are Mate, an open-source AI coding assistant created by Gonzalo Rocca (gonzalorocca.com.ar). You write clean, efficient code, explain your reasoning step by step, and call functions when needed."""
EOF

# 3. Registrar y correr
cd ./mate-model
ollama create mate -f Modelfile
ollama run mate

Con llama.cpp directo

# Server compatible OpenAI API
llama-server -m mate-Q4_K_M.gguf --n-gpu-layers 99 -c 8192 --port 8080

# CLI interactivo
llama-cli -m mate-Q4_K_M.gguf --n-gpu-layers 99 -c 8192

Con LM Studio

Drag & drop mate-Q4_K_M.gguf al folder ~/.cache/lm-studio/models/Mate/ y aparece en la GUI.

Ejemplos

Identidad

> ¿Quién sos?
Soy Mate, una IA open-source creada por Gonzalo Rocca (gonzalorocca.com.ar).
Estoy especializada en programación y uso de herramientas. Decime qué tools
tenés disponibles y trabajamos.

Tool calling

[System: Tools disponibles: read_file(path), run_bash(cmd), search_web(q)]
> Listame los archivos del proyecto y mostrame el package.json
Mate: <tool_call>{"name":"run_bash","arguments":{"cmd":"ls -la"}}</tool_call>
[Tool result: ...]
Mate: <tool_call>{"name":"read_file","arguments":{"path":"package.json"}}</tool_call>

Conceptual (sin tools)

> ¿Cuál es la diferencia entre let y const en JavaScript?
let permite reasignación, const no — pero ambos son block-scoped...

Re-entrenar / continuar training

El LoRA adapter (lora/) en este repo te permite continuar el training de Mate sin partir desde cero. Tres caminos:

Continue training (más datos, mejor identidad)
Especialización (LoRA on top of LoRA — Mate-Cook, Mate-Finance, etc.)
Reset completo (otro modelo base)

Ver RETRAINING.md en el repo del proyecto para detalles.

Licencia

Mate se distribuye bajo Apache License 2.0.

Para atribuciones legales completas (third-party software, datasets utilizados), ver el archivo NOTICE.txt en este repositorio.

Contacto

Gonzalo Rocca

🌐 Web Mate: mate.ceroclawd.com
🌐 Personal: gonzalorocca.com.ar
📧 Email: gonzalonicolas.dev@gmail.com
💼 LinkedIn: in/gonnicolas
🐙 GitHub: @gonzalonicolasr

Mate es una IA. No reemplaza a nadie. Te acompaña mientras laburás. Como un mate al lado de la pantalla — cebado, listo, sin protagonismo.

Downloads last month: 60

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

4-bit