File size: 9,673 Bytes
9a9c5c5
 
 
 
 
 
 
 
 
 
 
eab066b
 
9a9c5c5
 
 
 
eab066b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77ad185
eab066b
77ad185
eab066b
 
 
 
 
 
 
9a9c5c5
eab066b
77ad185
eab066b
77ad185
eab066b
77ad185
eab066b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53b7cce
eab066b
53b7cce
 
 
eab066b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53b7cce
 
eab066b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53b7cce
eab066b
53b7cce
eab066b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77ad185
 
eab066b
 
 
77ad185
 
eab066b
 
 
 
 
 
 
 
 
77ad185
eab066b
77ad185
eab066b
5f76079
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
---
license: apache-2.0
base_model: Kwaipilot/KAT-Dev-72B-Exp
pipeline_tag: text-generation
library_name: llama.cpp
language:
  - en
tags:
  - gguf
  - quantized
  - ollama
  - coding
  - llama-cpp
  - text-generation
quantized_by: richardyoung
---

<div align="center">

# ๐Ÿ’ป KAT-Dev 72B - GGUF

### Enterprise-Grade 72B Coding Model, Optimized for Local Inference

[![GGUF](https://img.shields.io/badge/Format-GGUF-blue)](https://github.com/ggerganov/llama.cpp)
[![Size](https://img.shields.io/badge/Variants-4_Quantizations-green)](https://huggingface.co/richardyoung/kat-dev-72b)
[![Ollama](https://img.shields.io/badge/Runtime-Ollama-orange)](https://ollama.ai/)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

**[Original Model](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp)** | **[Ollama Registry](https://ollama.com/richardyoung/kat-dev-72b)** | **[llama.cpp](https://github.com/ggerganov/llama.cpp)**

---

</div>

## ๐Ÿ“– What is This?

This is **KAT-Dev 72B**, a powerful coding model with 72 billion parameters, quantized to **GGUF format** for efficient local inference. Perfect for developers who want enterprise-grade code assistance running entirely on their own hardware with Ollama or llama.cpp!

### โœจ Why You'll Love It

- ๐Ÿ’ป **Coding-Focused** - Optimized specifically for programming tasks
- ๐Ÿง  **72B Parameters** - Large enough for complex reasoning and refactoring
- โšก **Local Inference** - Run entirely on your machine, no API calls
- ๐Ÿ”’ **Privacy First** - Your code never leaves your computer
- ๐ŸŽฏ **Multiple Quantizations** - Choose your speed/quality trade-off
- ๐Ÿš€ **Ollama Ready** - One command to start coding
- ๐Ÿ”ง **llama.cpp Compatible** - Works with your favorite tools

## ๐ŸŽฏ Quick Start

### Option 1: Ollama (Easiest!)

Pull and run directly from the Ollama registry:

```bash
# Recommended: IQ3_M (best balance)
ollama run richardyoung/kat-dev-72b:iq3_m

# Other variants
ollama run richardyoung/kat-dev-72b:iq4_xs  # Better quality
ollama run richardyoung/kat-dev-72b:iq2_m   # Faster, smaller
ollama run richardyoung/kat-dev-72b:iq2_xxs # Most compact
```

That's it! Start asking coding questions! ๐ŸŽ‰

### Option 2: Build from Modelfile

Download this repo and build locally:

```bash
# Clone or download the modelfiles
ollama create kat-dev-72b-iq3_m -f modelfiles/kat-dev-72b--iq3_m.Modelfile
ollama run kat-dev-72b-iq3_m
```

### Option 3: llama.cpp

Use with llama.cpp directly:

```bash
# Download the GGUF file (replace variant as needed)
huggingface-cli download richardyoung/kat-dev-72b kat-dev-72b-iq3_m.gguf --local-dir ./

# Run with llama.cpp
./llama-cli -m kat-dev-72b-iq3_m.gguf -p "Write a Python function to"
```

## ๐Ÿ’ป System Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| **RAM** | 32 GB | 64 GB+ |
| **Storage** | 40 GB free | 50+ GB free |
| **CPU** | Modern 8-core | 16+ cores |
| **GPU** | Optional (CPU-only works!) | Metal/CUDA for acceleration |
| **OS** | macOS, Linux, Windows | Latest versions |

> ๐Ÿ’ก **Tip:** Larger quantizations (IQ4_XS) need more RAM but produce better code. Smaller ones (IQ2_XXS) are faster but less precise.

## ๐ŸŽจ Available Quantizations

Choose the right balance for your needs:

| Quantization | Size | Quality | Speed | RAM Usage | Best For |
|--------------|------|---------|-------|-----------|----------|
| **IQ4_XS** | 37 GB | โญโญโญโญโญ | โญโญโญ | ~50 GB | Production code, complex refactoring |
| **IQ3_M** (recommended) | 33 GB | โญโญโญโญ | โญโญโญโญ | ~40 GB | Daily development, best balance |
| **IQ2_M** | 27 GB | โญโญโญ | โญโญโญโญโญ | ~35 GB | Quick prototyping, fast iteration |
| **IQ2_XXS** | 24 GB | โญโญ | โญโญโญโญโญ | ~30 GB | Testing, very constrained systems |

### Variant Details

| Variant | Size | Blob SHA256 |
|---------|------|-------------|
| `iq4_xs` | 36.98 GB | `c4cb9c6e...` |
| `iq3_m` | 33.07 GB | `14d07184...` |
| `iq2_m` | 27.32 GB | `cbe26a3c...` |
| `iq2_xxs` | 23.74 GB | `a49c7526...` |

## ๐Ÿ“š Usage Examples

### Code Generation

```bash
ollama run richardyoung/kat-dev-72b:iq3_m "Write a Python function to validate email addresses with regex"
```

### Code Explanation

```bash
ollama run richardyoung/kat-dev-72b:iq3_m "Explain this code: def fib(n): return n if n < 2 else fib(n-1) + fib(n-2)"
```

### Debugging Help

```bash
ollama run richardyoung/kat-dev-72b:iq3_m "Why does this Python code raise a KeyError?"
```

### Refactoring

```bash
ollama run richardyoung/kat-dev-72b:iq3_m "Refactor this JavaScript function to use async/await instead of callbacks"
```

### Multi-turn Conversation

```bash
ollama run richardyoung/kat-dev-72b:iq3_m
>>> I need to build a REST API in Python
>>> Show me a FastAPI example with authentication
>>> How do I add rate limiting?
```

## ๐Ÿ—๏ธ Model Details

<details>
<summary><b>Click to expand technical details</b></summary>

### Architecture

- **Base Model:** KAT-Dev 72B Exp by Kwaipilot
- **Parameters:** ~72 Billion
- **Quantization:** GGUF format (IQ2_XXS to IQ4_XS)
- **Context Length:** Standard (check base model for specifics)
- **Optimization:** Code generation and understanding
- **Training:** Specialized for programming tasks

### Supported Languages

The model excels at:
- Python
- JavaScript/TypeScript
- Java
- C/C++
- Go
- Rust
- And many more!

</details>

## โšก Performance Tips

<details>
<summary><b>Getting the best results</b></summary>

1. **Choose the right quantization** - IQ3_M is recommended for daily use
2. **Use specific prompts** - "Write a Python function to X" works better than "code for X"
3. **Provide context** - Share error messages, file structures, or requirements
4. **Iterate** - Ask follow-up questions to refine the code
5. **GPU acceleration** - Use Metal (Mac) or CUDA (NVIDIA) for faster inference
6. **Temperature settings** - Lower (0.1-0.3) for precise code, higher (0.7-0.9) for creative solutions

### Example Ollama Configuration

```bash
# Create with custom parameters
ollama create my-kat-dev -f modelfiles/kat-dev-72b--iq3_m.Modelfile

# Edit the Modelfile to add:
PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
```

</details>

## ๐Ÿ”ง Building Custom Variants

You can modify the included Modelfiles to customize behavior:

```dockerfile
FROM ./kat-dev-72b-iq3_m.gguf

# System prompt
SYSTEM You are an expert programmer specializing in Python and web development.

# Parameters
PARAMETER temperature 0.2
PARAMETER num_ctx 8192
PARAMETER stop "<|endoftext|>"
```

Then build:

```bash
ollama create my-custom-kat -f custom.Modelfile
```

## โš ๏ธ Known Limitations

- ๐Ÿ’พ **Large Size** - Even the smallest variant needs 24+ GB of storage
- ๐Ÿ **RAM Intensive** - Requires significant system memory
- โฑ๏ธ **Inference Speed** - Slower than smaller models (trade-off for quality)
- ๐ŸŒ **English-Focused** - Best performance with English prompts
- ๐Ÿ“ **Code-Specialized** - Not optimized for general conversation

## ๐Ÿ“„ License

Apache 2.0 - Same as the original model. Free for commercial use!

## ๐Ÿ™ Acknowledgments

- **Original Model:** [Kwaipilot](https://huggingface.co/Kwaipilot) for creating KAT-Dev 72B
- **GGUF Format:** [Georgi Gerganov](https://github.com/ggerganov) for llama.cpp
- **Ollama:** [Ollama team](https://ollama.ai/) for the amazing runtime
- **Community:** All the developers testing and providing feedback

## ๐Ÿ”— Useful Links

- ๐Ÿ“ฆ **Original Model:** [Kwaipilot/KAT-Dev-72B-Exp](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp)
- ๐Ÿš€ **Ollama Registry:** [richardyoung/kat-dev-72b](https://ollama.com/richardyoung/kat-dev-72b)
- ๐Ÿ› ๏ธ **llama.cpp:** [GitHub](https://github.com/ggerganov/llama.cpp)
- ๐Ÿ“– **Ollama Docs:** [Documentation](https://github.com/ollama/ollama)
- ๐Ÿ’ฌ **Discussions:** [Ask questions here!](https://huggingface.co/richardyoung/kat-dev-72b/discussions)

## ๐ŸŽฎ Pro Tips

<details>
<summary><b>Advanced usage patterns</b></summary>

### 1. Integration with VS Code

Use with Continue.dev or other coding assistants:

```json
{
  "models": [
    {
      "title": "KAT-Dev 72B",
      "provider": "ollama",
      "model": "richardyoung/kat-dev-72b:iq3_m"
    }
  ]
}
```

### 2. API Server Mode

Run as an OpenAI-compatible API:

```bash
ollama serve
# Then use the API at http://localhost:11434
```

### 3. Batch Processing

Process multiple files:

```bash
for file in *.py; do
  ollama run richardyoung/kat-dev-72b:iq3_m "Review this code: $(cat $file)" > "${file}.review"
done
```

</details>

---

<div align="center">

**Quantized with โค๏ธ by [richardyoung](https://deepneuro.ai/richard)**

*If you find this useful, please โญ star the repo and share with other developers!*

**Format:** GGUF | **Runtime:** Ollama / llama.cpp | **Created:** October 2025

</div>


## Hardware Requirements

KAT-Dev 72B is a large coding model. Choose your quantization based on available VRAM/RAM:

| Quantization | Model Size | VRAM Required | Quality |
|:------------:|:----------:|:-------------:|:--------|
| **Q2_K** | ~27 GB | 32 GB | Acceptable |
| **Q3_K_M** | ~34 GB | 40 GB | Good |
| **Q4_K_M** | ~42 GB | 48 GB | Very Good - recommended |
| **Q5_K_M** | ~50 GB | 56 GB | Excellent |
| **Q6_K** | ~58 GB | 64 GB | Near original |
| **Q8_0** | ~77 GB | 80 GB | Original quality |

### Recommended Setups

| Hardware | Recommended Quantization |
|:---------|:-------------------------|
| RTX 4090 (24GB) | Q2_K with offloading |
| 2x RTX 4090 (48GB) | Q4_K_M |
| A100 (80GB) | Q8_0 |
| Mac Studio M2 Ultra (192GB) | Q8_0 via llama.cpp |