---
license: apache-2.0
language:
  - es
  - en
library_name: gguf
tags:
  - mate
  - code
  - tool-calling
  - reasoning
  - argentina
  - moe
  - quantized
  - q4_k_m
pipeline_tag: text-generation
---

# Mate

> Una AI que vive en tu máquina.
> Por [Gonzalo Rocca](https://gonzalorocca.com.ar) — 2026 — San Luis, Argentina

🔗 **Web**: [mate.ceroclawd.com](https://mate.ceroclawd.com)
🔗 **Autor**: [gonzalorocca.com.ar](https://gonzalorocca.com.ar)

---

## Por qué existe

Hace más de 10 años escribo software para sistemas que no se pueden caer. Lugares donde un error a las 3 AM cuesta caro y no avisa.

Hoy la AI es parte del trabajo diario. Pero algo me hacía ruido — cada prompt viajaba a un servidor remoto, cada mes una factura distinta, cada release de un proveedor cambiaba el comportamiento del modelo. **No era mío. Era prestado.**

Quería algo que estuviera ahí cuando lo necesitara. Sin servidores, sin claves de API, sin paywalls. Algo entrenado a mi gusto, con mi forma de razonar, que no me hablara como vendedor.

Lo armé entre noches y fines de semana. Cata tiene 6 y arma mundos enteros en el living antes de cenar. Olivia tiene un mes y medio y casi no pesa. Eso es lo principal. Mate lo armé en los ratos que sobran — noches, fines de semana, lo que viene después de ellas.

El nombre vino último. Buscaba algo que sonara a compañía silenciosa. A algo que está, sin pedirte permiso.

**Mate.**

---

## Qué hace

- 🛠️ **Tool calling preciso** — decide cuándo invocar funciones, genera JSON parseable, encadena resultados multi-turn (compatible OpenAI function-calling spec)
- 💻 **Código limpio** — Python, JavaScript, TypeScript, Go, Rust. Refactor, debug, generación
- 🧠 **Razona antes de actuar** — bloques `<think>...</think>` para evaluar qué tool conviene y qué edge cases tiene
- 🌎 **Bilingüe** — español argentino + inglés
- 🏠 **100% local** — corre en tu GPU, sin telemetría, sin internet, sin API keys

---

## Specs técnicas

| | |
|---|---|
| **Arquitectura** | Mixture of Experts · 27B totales / 4B activos |
| **Cuantización** | GGUF Q4_K_M (5.32 BPW) |
| **Tamaño en disco** | ~15.6 GB |
| **VRAM mínima** | 16 GB (RTX 3090 24 GB cómodo) |
| **Velocidad inference** | ~50-60 tok/s en RTX 3090 |
| **Context window** | 8192 tokens default (configurable hasta 256K) |
| **Idiomas** | Español + Inglés |
| **Capacidades** | Code generation · **Tool calling (OpenAI spec)** · Reasoning |
| **Training method** | QLoRA 4-bit + Unsloth |
| **Training hardware** | NVIDIA H100 80GB (RunPod) |
| **Training dataset** | ~10k ejemplos curados |

### 🛠️ Tool calling (OpenAI function-calling spec)

Mate fue entrenado con **~7.500 ejemplos de tool calling** (xLAM + Glaive). Sabe:

- **Decidir cuándo invocar una tool** vs responder en texto (incluye ~800 ejemplos negativos para no llamar tools innecesariamente)
- **Generar JSON estricto** con `name` + `arguments` correctos según el schema
- **Encadenar multi-turn**: ver el resultado de una tool, decidir si llama otra, sintetizar respuesta final
- **Razonar antes** con bloques `<think>...</think>` — qué tool conviene y por qué

Compatible con el formato OpenAI function-calling: pasale las definiciones de tools en el system prompt y Mate las usa.

### 🖥️ Compatible con todos los runtimes principales

Como GGUF estándar, Mate corre nativamente en:

| Runtime | Caso de uso |
|---|---|
| [**Ollama**](https://ollama.com) | API REST + CLI · `ollama run mate` |
| [**llama.cpp**](https://github.com/ggerganov/llama.cpp) | Server `llama-server` (compatible OpenAI API) o `llama-cli` interactivo |
| [**LM Studio**](https://lmstudio.ai) | GUI desktop drag & drop |
| [**Jan**](https://jan.ai) | Cliente desktop con UI cómoda |
| [**KoboldCpp**](https://github.com/LostRuins/koboldcpp) | UI web local |
| [**Open WebUI**](https://github.com/open-webui/open-webui) | Frontend web tipo ChatGPT, sobre Ollama |
| [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | UI completa con extensiones |
| [**llamafile**](https://github.com/Mozilla-Ocho/llamafile) | Single-binary executable |

Cualquier app que soporte GGUF te sirve.

### Mix de training

| Bucket | % | Fuente |
|---|---|---|
| Tool calling single-turn | 40% | `Salesforce/xlam-function-calling-60k` |
| Tool calling multi-turn | 15% | `glaiveai/glaive-function-calling-v2` |
| Reasoning + code (Claude Opus synthetic) | 18% | `angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k` |
| Code generation | 10% | `m-a-p/CodeFeedback-Filtered-Instruction` |
| Negativos (no-tool) | 8% | curado del CodeFeedback |
| Identidad Mate | 7% | manual (autor) |

---

## Archivos en este repo

| Archivo | Tamaño | Para qué |
|---|---|---|
| `mate-Q4_K_M.gguf` | ~15.6 GB | Modelo cuantizado para inference (Ollama, llama.cpp, LM Studio) |
| `lora/` | ~3.8 GB | LoRA adapter original — para re-trains futuros (continue training, especializaciones) |

---

## Cómo usarlo

### Con Ollama (recomendado)

```bash
# 1. Descargar el GGUF
hf download gonrocca/mate-v1 mate-Q4_K_M.gguf --local-dir ./mate-model

# 2. Crear Modelfile
cat > ./mate-model/Modelfile <<'EOF'
FROM ./mate-Q4_K_M.gguf

TEMPLATE """{{- range $i, $_ := .Messages }}
{{- if eq .Role "system" }}<|turn>system
{{ .Content }}<turn|>
{{ else if eq .Role "user" }}<|turn>user
{{ .Content }}<turn|>
{{ else if eq .Role "assistant" }}<|turn>model
{{ .Content }}<turn|>
{{ end }}
{{- end }}{{- if .Messages }}<|turn>model
{{ end }}"""

PARAMETER stop "<turn|>"
PARAMETER stop "<|turn>"
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
PARAMETER num_gpu 999

SYSTEM """You are Mate, an open-source AI coding assistant created by Gonzalo Rocca (gonzalorocca.com.ar). You write clean, efficient code, explain your reasoning step by step, and call functions when needed."""
EOF

# 3. Registrar y correr
cd ./mate-model
ollama create mate -f Modelfile
ollama run mate
```

### Con llama.cpp directo

```bash
# Server compatible OpenAI API
llama-server -m mate-Q4_K_M.gguf --n-gpu-layers 99 -c 8192 --port 8080

# CLI interactivo
llama-cli -m mate-Q4_K_M.gguf --n-gpu-layers 99 -c 8192
```

### Con LM Studio

Drag & drop `mate-Q4_K_M.gguf` al folder `~/.cache/lm-studio/models/Mate/` y aparece en la GUI.

---

## Ejemplos

### Identidad

```
> ¿Quién sos?
Soy Mate, una IA open-source creada por Gonzalo Rocca (gonzalorocca.com.ar).
Estoy especializada en programación y uso de herramientas. Decime qué tools
tenés disponibles y trabajamos.
```

### Tool calling

```
[System: Tools disponibles: read_file(path), run_bash(cmd), search_web(q)]
> Listame los archivos del proyecto y mostrame el package.json
Mate: <tool_call>{"name":"run_bash","arguments":{"cmd":"ls -la"}}</tool_call>
[Tool result: ...]
Mate: <tool_call>{"name":"read_file","arguments":{"path":"package.json"}}</tool_call>
```

### Conceptual (sin tools)

```
> ¿Cuál es la diferencia entre let y const en JavaScript?
let permite reasignación, const no — pero ambos son block-scoped...
```

---

## Re-entrenar / continuar training

El LoRA adapter (`lora/`) en este repo te permite continuar el training de Mate
sin partir desde cero. Tres caminos:

1. **Continue training** (más datos, mejor identidad)
2. **Especialización** (LoRA on top of LoRA — Mate-Cook, Mate-Finance, etc.)
3. **Reset completo** (otro modelo base)

Ver `RETRAINING.md` en el repo del proyecto para detalles.

---

## Licencia

**Mate** se distribuye bajo **Apache License 2.0**.

Para atribuciones legales completas (third-party software, datasets utilizados),
ver el archivo [`NOTICE.txt`](./blob/main/NOTICE.txt) en este repositorio.

---

## Contacto

[**Gonzalo Rocca**](https://gonzalorocca.com.ar)

- 🌐 Web Mate: [mate.ceroclawd.com](https://mate.ceroclawd.com)
- 🌐 Personal: [gonzalorocca.com.ar](https://gonzalorocca.com.ar)
- 📧 Email: gonzalonicolas.dev@gmail.com
- 💼 LinkedIn: [in/gonnicolas](https://www.linkedin.com/in/gonnicolas)
- 🐙 GitHub: [@gonzalonicolasr](https://github.com/gonzalonicolasr)

---

*Mate es una IA. No reemplaza a nadie. Te acompaña mientras laburás. Como un mate al lado de la pantalla — cebado, listo, sin protagonismo.*