Instructions to use gonrocca/mate-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gonrocca/mate-v1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="gonrocca/mate-v1",
	filename="mate-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use gonrocca/mate-v1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gonrocca/mate-v1:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gonrocca/mate-v1:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf gonrocca/mate-v1:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf gonrocca/mate-v1:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf gonrocca/mate-v1:Q4_K_M

Use Docker

docker model run hf.co/gonrocca/mate-v1:Q4_K_M

LM Studio
Jan

vLLM

How to use gonrocca/mate-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gonrocca/mate-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gonrocca/mate-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/gonrocca/mate-v1:Q4_K_M

Ollama
How to use gonrocca/mate-v1 with Ollama:
```
ollama run hf.co/gonrocca/mate-v1:Q4_K_M
```

Unsloth Studio

How to use gonrocca/mate-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gonrocca/mate-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gonrocca/mate-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for gonrocca/mate-v1 to start chatting

How to use gonrocca/mate-v1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf gonrocca/mate-v1:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "gonrocca/mate-v1:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use gonrocca/mate-v1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf gonrocca/mate-v1:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default gonrocca/mate-v1:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use gonrocca/mate-v1 with Docker Model Runner:
```
docker model run hf.co/gonrocca/mate-v1:Q4_K_M
```

Lemonade

How to use gonrocca/mate-v1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull gonrocca/mate-v1:Q4_K_M

Run and chat with the model

lemonade run user.mate-v1-Q4_K_M

List all available models

lemonade list

mate-v1 / README.md

gonrocca

Upload README.md with huggingface_hub

9bcdced verified 24 days ago

preview code

raw

history blame contribute delete

8.26 kB

	---
	license: apache-2.0
	language:
	- es
	- en
	library_name: gguf
	tags:
	- mate
	- code
	- tool-calling
	- reasoning
	- argentina
	- moe
	- quantized
	- q4_k_m
	pipeline_tag: text-generation
	---

	# Mate

	> Una AI que vive en tu máquina.
	> Por [Gonzalo Rocca](https://gonzalorocca.com.ar) — 2026 — San Luis, Argentina

	🔗 Web: [mate.ceroclawd.com](https://mate.ceroclawd.com)
	🔗 Autor: [gonzalorocca.com.ar](https://gonzalorocca.com.ar)

	---

	## Por qué existe

	Hace más de 10 años escribo software para sistemas que no se pueden caer. Lugares donde un error a las 3 AM cuesta caro y no avisa.

	Hoy la AI es parte del trabajo diario. Pero algo me hacía ruido — cada prompt viajaba a un servidor remoto, cada mes una factura distinta, cada release de un proveedor cambiaba el comportamiento del modelo. No era mío. Era prestado.

	Quería algo que estuviera ahí cuando lo necesitara. Sin servidores, sin claves de API, sin paywalls. Algo entrenado a mi gusto, con mi forma de razonar, que no me hablara como vendedor.

	Lo armé entre noches y fines de semana. Cata tiene 6 y arma mundos enteros en el living antes de cenar. Olivia tiene un mes y medio y casi no pesa. Eso es lo principal. Mate lo armé en los ratos que sobran — noches, fines de semana, lo que viene después de ellas.

	El nombre vino último. Buscaba algo que sonara a compañía silenciosa. A algo que está, sin pedirte permiso.

	Mate.

	---

	## Qué hace

	- 🛠️ Tool calling preciso — decide cuándo invocar funciones, genera JSON parseable, encadena resultados multi-turn (compatible OpenAI function-calling spec)
	- 💻 Código limpio — Python, JavaScript, TypeScript, Go, Rust. Refactor, debug, generación
	- 🧠 Razona antes de actuar — bloques `<think>...</think>` para evaluar qué tool conviene y qué edge cases tiene
	- 🌎 Bilingüe — español argentino + inglés
	- 🏠 100% local — corre en tu GPU, sin telemetría, sin internet, sin API keys

	---

	## Specs técnicas

	\| \| \|
	\|---\|---\|
	\| Arquitectura \| Mixture of Experts · 27B totales / 4B activos \|
	\| Cuantización \| GGUF Q4_K_M (5.32 BPW) \|
	\| Tamaño en disco \| ~15.6 GB \|
	\| VRAM mínima \| 16 GB (RTX 3090 24 GB cómodo) \|
	\| Velocidad inference \| ~50-60 tok/s en RTX 3090 \|
	\| Context window \| 8192 tokens default (configurable hasta 256K) \|
	\| Idiomas \| Español + Inglés \|
	\| Capacidades \| Code generation · Tool calling (OpenAI spec) · Reasoning \|
	\| Training method \| QLoRA 4-bit + Unsloth \|
	\| Training hardware \| NVIDIA H100 80GB (RunPod) \|
	\| Training dataset \| ~10k ejemplos curados \|

	### 🛠️ Tool calling (OpenAI function-calling spec)

	Mate fue entrenado con ~7.500 ejemplos de tool calling (xLAM + Glaive). Sabe:

	- Decidir cuándo invocar una tool vs responder en texto (incluye ~800 ejemplos negativos para no llamar tools innecesariamente)
	- Generar JSON estricto con `name` + `arguments` correctos según el schema
	- Encadenar multi-turn: ver el resultado de una tool, decidir si llama otra, sintetizar respuesta final
	- Razonar antes con bloques `<think>...</think>` — qué tool conviene y por qué

	Compatible con el formato OpenAI function-calling: pasale las definiciones de tools en el system prompt y Mate las usa.

	### 🖥️ Compatible con todos los runtimes principales

	Como GGUF estándar, Mate corre nativamente en:

	\| Runtime \| Caso de uso \|
	\|---\|---\|
	\| [Ollama](https://ollama.com) \| API REST + CLI · `ollama run mate` \|
	\| [llama.cpp](https://github.com/ggerganov/llama.cpp) \| Server `llama-server` (compatible OpenAI API) o `llama-cli` interactivo \|
	\| [LM Studio](https://lmstudio.ai) \| GUI desktop drag & drop \|
	\| [Jan](https://jan.ai) \| Cliente desktop con UI cómoda \|
	\| [KoboldCpp](https://github.com/LostRuins/koboldcpp) \| UI web local \|
	\| [Open WebUI](https://github.com/open-webui/open-webui) \| Frontend web tipo ChatGPT, sobre Ollama \|
	\| [text-generation-webui](https://github.com/oobabooga/text-generation-webui) \| UI completa con extensiones \|
	\| [llamafile](https://github.com/Mozilla-Ocho/llamafile) \| Single-binary executable \|

	Cualquier app que soporte GGUF te sirve.

	### Mix de training

	\| Bucket \| % \| Fuente \|
	\|---\|---\|---\|
	\| Tool calling single-turn \| 40% \| `Salesforce/xlam-function-calling-60k` \|
	\| Tool calling multi-turn \| 15% \| `glaiveai/glaive-function-calling-v2` \|
	\| Reasoning + code (Claude Opus synthetic) \| 18% \| `angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k` \|
	\| Code generation \| 10% \| `m-a-p/CodeFeedback-Filtered-Instruction` \|
	\| Negativos (no-tool) \| 8% \| curado del CodeFeedback \|
	\| Identidad Mate \| 7% \| manual (autor) \|

	---

	## Archivos en este repo

	\| Archivo \| Tamaño \| Para qué \|
	\|---\|---\|---\|
	\| `mate-Q4_K_M.gguf` \| ~15.6 GB \| Modelo cuantizado para inference (Ollama, llama.cpp, LM Studio) \|
	\| `lora/` \| ~3.8 GB \| LoRA adapter original — para re-trains futuros (continue training, especializaciones) \|

	---

	## Cómo usarlo

	### Con Ollama (recomendado)

	```bash
	# 1. Descargar el GGUF
	hf download gonrocca/mate-v1 mate-Q4_K_M.gguf --local-dir ./mate-model

	# 2. Crear Modelfile
	cat > ./mate-model/Modelfile <<'EOF'
	FROM ./mate-Q4_K_M.gguf

	TEMPLATE """{{- range $i, $_ := .Messages }}
	{{- if eq .Role "system" }}<\|turn>system
	{{ .Content }}<turn\|>
	{{ else if eq .Role "user" }}<\|turn>user
	{{ .Content }}<turn\|>
	{{ else if eq .Role "assistant" }}<\|turn>model
	{{ .Content }}<turn\|>
	{{ end }}
	{{- end }}{{- if .Messages }}<\|turn>model
	{{ end }}"""

	PARAMETER stop "<turn\|>"
	PARAMETER stop "<\|turn>"
	PARAMETER temperature 0.3
	PARAMETER top_p 0.9
	PARAMETER num_ctx 8192
	PARAMETER num_gpu 999

	SYSTEM """You are Mate, an open-source AI coding assistant created by Gonzalo Rocca (gonzalorocca.com.ar). You write clean, efficient code, explain your reasoning step by step, and call functions when needed."""
	EOF

	# 3. Registrar y correr
	cd ./mate-model
	ollama create mate -f Modelfile
	ollama run mate
	```

	### Con llama.cpp directo

	```bash
	# Server compatible OpenAI API
	llama-server -m mate-Q4_K_M.gguf --n-gpu-layers 99 -c 8192 --port 8080

	# CLI interactivo
	llama-cli -m mate-Q4_K_M.gguf --n-gpu-layers 99 -c 8192
	```

	### Con LM Studio

	Drag & drop `mate-Q4_K_M.gguf` al folder `~/.cache/lm-studio/models/Mate/` y aparece en la GUI.

	---

	## Ejemplos

	### Identidad

	```
	> ¿Quién sos?
	Soy Mate, una IA open-source creada por Gonzalo Rocca (gonzalorocca.com.ar).
	Estoy especializada en programación y uso de herramientas. Decime qué tools
	tenés disponibles y trabajamos.
	```

	### Tool calling

	```
	[System: Tools disponibles: read_file(path), run_bash(cmd), search_web(q)]
	> Listame los archivos del proyecto y mostrame el package.json
	Mate: <tool_call>{"name":"run_bash","arguments":{"cmd":"ls -la"}}</tool_call>
	[Tool result: ...]
	Mate: <tool_call>{"name":"read_file","arguments":{"path":"package.json"}}</tool_call>
	```

	### Conceptual (sin tools)

	```
	> ¿Cuál es la diferencia entre let y const en JavaScript?
	let permite reasignación, const no — pero ambos son block-scoped...
	```

	---

	## Re-entrenar / continuar training

	El LoRA adapter (`lora/`) en este repo te permite continuar el training de Mate
	sin partir desde cero. Tres caminos:

	1. Continue training (más datos, mejor identidad)
	2. Especialización (LoRA on top of LoRA — Mate-Cook, Mate-Finance, etc.)
	3. Reset completo (otro modelo base)

	Ver `RETRAINING.md` en el repo del proyecto para detalles.

	---

	## Licencia

	Mate se distribuye bajo Apache License 2.0.

	Para atribuciones legales completas (third-party software, datasets utilizados),
	ver el archivo [`NOTICE.txt`](./blob/main/NOTICE.txt) en este repositorio.

	---

	## Contacto

	[Gonzalo Rocca](https://gonzalorocca.com.ar)

	- 🌐 Web Mate: [mate.ceroclawd.com](https://mate.ceroclawd.com)
	- 🌐 Personal: [gonzalorocca.com.ar](https://gonzalorocca.com.ar)
	- 📧 Email: gonzalonicolas.dev@gmail.com
	- 💼 LinkedIn: [in/gonnicolas](https://www.linkedin.com/in/gonnicolas)
	- 🐙 GitHub: [@gonzalonicolasr](https://github.com/gonzalonicolasr)

	---

	Mate es una IA. No reemplaza a nadie. Te acompaña mientras laburás. Como un mate al lado de la pantalla — cebado, listo, sin protagonismo.