richardyoung commited on
Commit
eab066b
·
verified ·
1 Parent(s): 53b7cce

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +275 -22
README.md CHANGED
@@ -1,4 +1,3 @@
1
-
2
  ---
3
  license: apache-2.0
4
  base_model: Kwaipilot/KAT-Dev-72B-Exp
@@ -10,48 +9,302 @@ tags:
10
  - gguf
11
  - quantized
12
  - ollama
 
 
13
  - text-generation
14
  quantized_by: richardyoung
15
  ---
16
 
17
- # Kat-Dev 72B (GGUF)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- Quantized builds of the KAT-Dev 72B coding model for Ollama / llama.cpp runtimes. Each variant ships with the matching Modelfile generated from the Ollama registry export.
20
 
21
- These binaries are derived from the upstream [`Kwaipilot/KAT-Dev-72B-Exp`](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp) release (Apache-2.0). The goal is to provide ready-to-run GGUF artifacts for local inference stacks such as Ollama and llama.cpp.
 
 
 
 
 
 
22
 
23
- ## Variants
24
 
25
- | Variant | Size | Blob |
26
- | --- | --- | --- |
27
- | `iq2_m` | 27.32 GB | `sha256-cbe26a3c280f1f1070b070ac3ab9bd1c3ddc23d422bb5ba902580b107765ca9c` |
28
- | `iq2_xxs` | 23.74 GB | `sha256-a49c7526f165f7320c434ceee55f72e93654a30a0ecde701a87e023d619c17b7` |
29
- | `iq3_m` | 33.07 GB | `sha256-14d07184013c2ce3d8be24188512382ed972fda2901cb2f5b5a9e8ebd0c7e4b9` |
30
- | `iq4_xs` | 36.98 GB | `sha256-c4cb9c6e6847031c418b076d68fb93852140a183afc171e4f62e3a84c58001f6` |
31
 
32
- ## Usage with Ollama
33
 
34
- ### Quick Start (Pull from Registry)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- You can directly pull and run the model from the Ollama registry:
37
 
38
  ```bash
39
  ollama run richardyoung/kat-dev-72b:iq3_m
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ```
41
 
42
- Available tags: `iq2_m`, `iq2_xxs`, `iq3_m`, `iq4_xs`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
- ### Alternative: Build from Modelfile
45
 
46
- You can also create the model locally from the included Modelfiles:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
  ```bash
49
- ollama create kat-dev-72b-iq4_xs -f modelfiles/kat-dev-72b--iq4_xs.Modelfile
50
- ollama run kat-dev-72b-iq4_xs
 
51
  ```
52
 
53
- You can swap `iq4_xs` for any other variant listed above.
 
 
 
 
 
 
 
 
54
 
55
- ## Source
56
 
57
- Originally published on my Ollama profile: https://ollama.com/richardyoung/kat-dev-72b
 
 
1
  ---
2
  license: apache-2.0
3
  base_model: Kwaipilot/KAT-Dev-72B-Exp
 
9
  - gguf
10
  - quantized
11
  - ollama
12
+ - coding
13
+ - llama-cpp
14
  - text-generation
15
  quantized_by: richardyoung
16
  ---
17
 
18
+ <div align="center">
19
+
20
+ # 💻 KAT-Dev 72B - GGUF
21
+
22
+ ### Enterprise-Grade 72B Coding Model, Optimized for Local Inference
23
+
24
+ [![GGUF](https://img.shields.io/badge/Format-GGUF-blue)](https://github.com/ggerganov/llama.cpp)
25
+ [![Size](https://img.shields.io/badge/Variants-4_Quantizations-green)](https://huggingface.co/richardyoung/kat-dev-72b)
26
+ [![Ollama](https://img.shields.io/badge/Runtime-Ollama-orange)](https://ollama.ai/)
27
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
28
+
29
+ **[Original Model](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp)** | **[Ollama Registry](https://ollama.com/richardyoung/kat-dev-72b)** | **[llama.cpp](https://github.com/ggerganov/llama.cpp)**
30
+
31
+ ---
32
+
33
+ </div>
34
+
35
+ ## 📖 What is This?
36
+
37
+ This is **KAT-Dev 72B**, a powerful coding model with 72 billion parameters, quantized to **GGUF format** for efficient local inference. Perfect for developers who want enterprise-grade code assistance running entirely on their own hardware with Ollama or llama.cpp!
38
+
39
+ ### ✨ Why You'll Love It
40
+
41
+ - 💻 **Coding-Focused** - Optimized specifically for programming tasks
42
+ - 🧠 **72B Parameters** - Large enough for complex reasoning and refactoring
43
+ - ⚡ **Local Inference** - Run entirely on your machine, no API calls
44
+ - 🔒 **Privacy First** - Your code never leaves your computer
45
+ - 🎯 **Multiple Quantizations** - Choose your speed/quality trade-off
46
+ - 🚀 **Ollama Ready** - One command to start coding
47
+ - 🔧 **llama.cpp Compatible** - Works with your favorite tools
48
+
49
+ ## 🎯 Quick Start
50
+
51
+ ### Option 1: Ollama (Easiest!)
52
+
53
+ Pull and run directly from the Ollama registry:
54
+
55
+ ```bash
56
+ # Recommended: IQ3_M (best balance)
57
+ ollama run richardyoung/kat-dev-72b:iq3_m
58
+
59
+ # Other variants
60
+ ollama run richardyoung/kat-dev-72b:iq4_xs # Better quality
61
+ ollama run richardyoung/kat-dev-72b:iq2_m # Faster, smaller
62
+ ollama run richardyoung/kat-dev-72b:iq2_xxs # Most compact
63
+ ```
64
+
65
+ That's it! Start asking coding questions! 🎉
66
+
67
+ ### Option 2: Build from Modelfile
68
+
69
+ Download this repo and build locally:
70
+
71
+ ```bash
72
+ # Clone or download the modelfiles
73
+ ollama create kat-dev-72b-iq3_m -f modelfiles/kat-dev-72b--iq3_m.Modelfile
74
+ ollama run kat-dev-72b-iq3_m
75
+ ```
76
+
77
+ ### Option 3: llama.cpp
78
+
79
+ Use with llama.cpp directly:
80
+
81
+ ```bash
82
+ # Download the GGUF file (replace variant as needed)
83
+ huggingface-cli download richardyoung/kat-dev-72b kat-dev-72b-iq3_m.gguf --local-dir ./
84
+
85
+ # Run with llama.cpp
86
+ ./llama-cli -m kat-dev-72b-iq3_m.gguf -p "Write a Python function to"
87
+ ```
88
 
89
+ ## 💻 System Requirements
90
 
91
+ | Component | Minimum | Recommended |
92
+ |-----------|---------|-------------|
93
+ | **RAM** | 32 GB | 64 GB+ |
94
+ | **Storage** | 40 GB free | 50+ GB free |
95
+ | **CPU** | Modern 8-core | 16+ cores |
96
+ | **GPU** | Optional (CPU-only works!) | Metal/CUDA for acceleration |
97
+ | **OS** | macOS, Linux, Windows | Latest versions |
98
 
99
+ > 💡 **Tip:** Larger quantizations (IQ4_XS) need more RAM but produce better code. Smaller ones (IQ2_XXS) are faster but less precise.
100
 
101
+ ## 🎨 Available Quantizations
 
 
 
 
 
102
 
103
+ Choose the right balance for your needs:
104
 
105
+ | Quantization | Size | Quality | Speed | RAM Usage | Best For |
106
+ |--------------|------|---------|-------|-----------|----------|
107
+ | **IQ4_XS** | 37 GB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ~50 GB | Production code, complex refactoring |
108
+ | **IQ3_M** (recommended) | 33 GB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ~40 GB | Daily development, best balance |
109
+ | **IQ2_M** | 27 GB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ~35 GB | Quick prototyping, fast iteration |
110
+ | **IQ2_XXS** | 24 GB | ⭐⭐ | ⭐⭐⭐⭐⭐ | ~30 GB | Testing, very constrained systems |
111
+
112
+ ### Variant Details
113
+
114
+ | Variant | Size | Blob SHA256 |
115
+ |---------|------|-------------|
116
+ | `iq4_xs` | 36.98 GB | `c4cb9c6e...` |
117
+ | `iq3_m` | 33.07 GB | `14d07184...` |
118
+ | `iq2_m` | 27.32 GB | `cbe26a3c...` |
119
+ | `iq2_xxs` | 23.74 GB | `a49c7526...` |
120
+
121
+ ## 📚 Usage Examples
122
+
123
+ ### Code Generation
124
+
125
+ ```bash
126
+ ollama run richardyoung/kat-dev-72b:iq3_m "Write a Python function to validate email addresses with regex"
127
+ ```
128
+
129
+ ### Code Explanation
130
+
131
+ ```bash
132
+ ollama run richardyoung/kat-dev-72b:iq3_m "Explain this code: def fib(n): return n if n < 2 else fib(n-1) + fib(n-2)"
133
+ ```
134
+
135
+ ### Debugging Help
136
+
137
+ ```bash
138
+ ollama run richardyoung/kat-dev-72b:iq3_m "Why does this Python code raise a KeyError?"
139
+ ```
140
+
141
+ ### Refactoring
142
+
143
+ ```bash
144
+ ollama run richardyoung/kat-dev-72b:iq3_m "Refactor this JavaScript function to use async/await instead of callbacks"
145
+ ```
146
 
147
+ ### Multi-turn Conversation
148
 
149
  ```bash
150
  ollama run richardyoung/kat-dev-72b:iq3_m
151
+ >>> I need to build a REST API in Python
152
+ >>> Show me a FastAPI example with authentication
153
+ >>> How do I add rate limiting?
154
+ ```
155
+
156
+ ## 🏗️ Model Details
157
+
158
+ <details>
159
+ <summary><b>Click to expand technical details</b></summary>
160
+
161
+ ### Architecture
162
+
163
+ - **Base Model:** KAT-Dev 72B Exp by Kwaipilot
164
+ - **Parameters:** ~72 Billion
165
+ - **Quantization:** GGUF format (IQ2_XXS to IQ4_XS)
166
+ - **Context Length:** Standard (check base model for specifics)
167
+ - **Optimization:** Code generation and understanding
168
+ - **Training:** Specialized for programming tasks
169
+
170
+ ### Supported Languages
171
+
172
+ The model excels at:
173
+ - Python
174
+ - JavaScript/TypeScript
175
+ - Java
176
+ - C/C++
177
+ - Go
178
+ - Rust
179
+ - And many more!
180
+
181
+ </details>
182
+
183
+ ## ⚡ Performance Tips
184
+
185
+ <details>
186
+ <summary><b>Getting the best results</b></summary>
187
+
188
+ 1. **Choose the right quantization** - IQ3_M is recommended for daily use
189
+ 2. **Use specific prompts** - "Write a Python function to X" works better than "code for X"
190
+ 3. **Provide context** - Share error messages, file structures, or requirements
191
+ 4. **Iterate** - Ask follow-up questions to refine the code
192
+ 5. **GPU acceleration** - Use Metal (Mac) or CUDA (NVIDIA) for faster inference
193
+ 6. **Temperature settings** - Lower (0.1-0.3) for precise code, higher (0.7-0.9) for creative solutions
194
+
195
+ ### Example Ollama Configuration
196
+
197
+ ```bash
198
+ # Create with custom parameters
199
+ ollama create my-kat-dev -f modelfiles/kat-dev-72b--iq3_m.Modelfile
200
+
201
+ # Edit the Modelfile to add:
202
+ PARAMETER temperature 0.2
203
+ PARAMETER top_p 0.9
204
+ PARAMETER repeat_penalty 1.1
205
+ ```
206
+
207
+ </details>
208
+
209
+ ## 🔧 Building Custom Variants
210
+
211
+ You can modify the included Modelfiles to customize behavior:
212
+
213
+ ```dockerfile
214
+ FROM ./kat-dev-72b-iq3_m.gguf
215
+
216
+ # System prompt
217
+ SYSTEM You are an expert programmer specializing in Python and web development.
218
+
219
+ # Parameters
220
+ PARAMETER temperature 0.2
221
+ PARAMETER num_ctx 8192
222
+ PARAMETER stop "<|endoftext|>"
223
+ ```
224
+
225
+ Then build:
226
+
227
+ ```bash
228
+ ollama create my-custom-kat -f custom.Modelfile
229
  ```
230
 
231
+ ## ⚠️ Known Limitations
232
+
233
+ - 💾 **Large Size** - Even the smallest variant needs 24+ GB of storage
234
+ - 🐏 **RAM Intensive** - Requires significant system memory
235
+ - ⏱️ **Inference Speed** - Slower than smaller models (trade-off for quality)
236
+ - 🌐 **English-Focused** - Best performance with English prompts
237
+ - 📝 **Code-Specialized** - Not optimized for general conversation
238
+
239
+ ## 📄 License
240
+
241
+ Apache 2.0 - Same as the original model. Free for commercial use!
242
+
243
+ ## 🙏 Acknowledgments
244
+
245
+ - **Original Model:** [Kwaipilot](https://huggingface.co/Kwaipilot) for creating KAT-Dev 72B
246
+ - **GGUF Format:** [Georgi Gerganov](https://github.com/ggerganov) for llama.cpp
247
+ - **Ollama:** [Ollama team](https://ollama.ai/) for the amazing runtime
248
+ - **Community:** All the developers testing and providing feedback
249
 
250
+ ## 🔗 Useful Links
251
 
252
+ - 📦 **Original Model:** [Kwaipilot/KAT-Dev-72B-Exp](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp)
253
+ - 🚀 **Ollama Registry:** [richardyoung/kat-dev-72b](https://ollama.com/richardyoung/kat-dev-72b)
254
+ - 🛠️ **llama.cpp:** [GitHub](https://github.com/ggerganov/llama.cpp)
255
+ - 📖 **Ollama Docs:** [Documentation](https://github.com/ollama/ollama)
256
+ - 💬 **Discussions:** [Ask questions here!](https://huggingface.co/richardyoung/kat-dev-72b/discussions)
257
+
258
+ ## 🎮 Pro Tips
259
+
260
+ <details>
261
+ <summary><b>Advanced usage patterns</b></summary>
262
+
263
+ ### 1. Integration with VS Code
264
+
265
+ Use with Continue.dev or other coding assistants:
266
+
267
+ ```json
268
+ {
269
+ "models": [
270
+ {
271
+ "title": "KAT-Dev 72B",
272
+ "provider": "ollama",
273
+ "model": "richardyoung/kat-dev-72b:iq3_m"
274
+ }
275
+ ]
276
+ }
277
+ ```
278
+
279
+ ### 2. API Server Mode
280
+
281
+ Run as an OpenAI-compatible API:
282
+
283
+ ```bash
284
+ ollama serve
285
+ # Then use the API at http://localhost:11434
286
+ ```
287
+
288
+ ### 3. Batch Processing
289
+
290
+ Process multiple files:
291
 
292
  ```bash
293
+ for file in *.py; do
294
+ ollama run richardyoung/kat-dev-72b:iq3_m "Review this code: $(cat $file)" > "${file}.review"
295
+ done
296
  ```
297
 
298
+ </details>
299
+
300
+ ---
301
+
302
+ <div align="center">
303
+
304
+ **Quantized with ❤️ by [richardyoung](https://deepneuro.ai/richard)**
305
+
306
+ *If you find this useful, please ⭐ star the repo and share with other developers!*
307
 
308
+ **Format:** GGUF | **Runtime:** Ollama / llama.cpp | **Created:** October 2025
309
 
310
+ </div>