richardyoung
/

kat-dev-72b

@@ -1,4 +1,3 @@
 ---
 license: apache-2.0
 base_model: Kwaipilot/KAT-Dev-72B-Exp
@@ -10,48 +9,302 @@ tags:
   - gguf
   - quantized
   - ollama
   - text-generation
 quantized_by: richardyoung
 ---
-# Kat-Dev 72B (GGUF)
-Quantized builds of the KAT-Dev 72B coding model for Ollama / llama.cpp runtimes. Each variant ships with the matching Modelfile generated from the Ollama registry export.
-These binaries are derived from the upstream [`Kwaipilot/KAT-Dev-72B-Exp`](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp) release (Apache-2.0). The goal is to provide ready-to-run GGUF artifacts for local inference stacks such as Ollama and llama.cpp.
-## Variants
-| Variant | Size | Blob |
-| --- | --- | --- |
-| `iq2_m` | 27.32 GB | `sha256-cbe26a3c280f1f1070b070ac3ab9bd1c3ddc23d422bb5ba902580b107765ca9c` |
-| `iq2_xxs` | 23.74 GB | `sha256-a49c7526f165f7320c434ceee55f72e93654a30a0ecde701a87e023d619c17b7` |
-| `iq3_m` | 33.07 GB | `sha256-14d07184013c2ce3d8be24188512382ed972fda2901cb2f5b5a9e8ebd0c7e4b9` |
-| `iq4_xs` | 36.98 GB | `sha256-c4cb9c6e6847031c418b076d68fb93852140a183afc171e4f62e3a84c58001f6` |
-## Usage with Ollama
-### Quick Start (Pull from Registry)
-You can directly pull and run the model from the Ollama registry:
 ```bash
 ollama run richardyoung/kat-dev-72b:iq3_m
 ```
-Available tags: `iq2_m`, `iq2_xxs`, `iq3_m`, `iq4_xs`
-### Alternative: Build from Modelfile
-You can also create the model locally from the included Modelfiles:
 ```bash
-ollama create kat-dev-72b-iq4_xs -f modelfiles/kat-dev-72b--iq4_xs.Modelfile
-ollama run kat-dev-72b-iq4_xs
 ```
-You can swap `iq4_xs` for any other variant listed above.
-## Source
-Originally published on my Ollama profile: https://ollama.com/richardyoung/kat-dev-72b

 ---
 license: apache-2.0
 base_model: Kwaipilot/KAT-Dev-72B-Exp
   - gguf
   - quantized
   - ollama
+  - coding
+  - llama-cpp
   - text-generation
 quantized_by: richardyoung
 ---
+<div align="center">
+# 💻 KAT-Dev 72B - GGUF
+### Enterprise-Grade 72B Coding Model, Optimized for Local Inference
+[![GGUF](https://img.shields.io/badge/Format-GGUF-blue)](https://github.com/ggerganov/llama.cpp)
+[![Size](https://img.shields.io/badge/Variants-4_Quantizations-green)](https://huggingface.co/richardyoung/kat-dev-72b)
+[![Ollama](https://img.shields.io/badge/Runtime-Ollama-orange)](https://ollama.ai/)
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+**[Original Model](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp)** | **[Ollama Registry](https://ollama.com/richardyoung/kat-dev-72b)** | **[llama.cpp](https://github.com/ggerganov/llama.cpp)**
+---
+</div>
+## 📖 What is This?
+This is **KAT-Dev 72B**, a powerful coding model with 72 billion parameters, quantized to **GGUF format** for efficient local inference. Perfect for developers who want enterprise-grade code assistance running entirely on their own hardware with Ollama or llama.cpp!
+### ✨ Why You'll Love It
+- 💻 **Coding-Focused** - Optimized specifically for programming tasks
+- 🧠 **72B Parameters** - Large enough for complex reasoning and refactoring
+- ⚡ **Local Inference** - Run entirely on your machine, no API calls
+- 🔒 **Privacy First** - Your code never leaves your computer
+- 🎯 **Multiple Quantizations** - Choose your speed/quality trade-off
+- 🚀 **Ollama Ready** - One command to start coding
+- 🔧 **llama.cpp Compatible** - Works with your favorite tools
+## 🎯 Quick Start
+### Option 1: Ollama (Easiest!)
+Pull and run directly from the Ollama registry:
+```bash
+# Recommended: IQ3_M (best balance)
+ollama run richardyoung/kat-dev-72b:iq3_m
+# Other variants
+ollama run richardyoung/kat-dev-72b:iq4_xs  # Better quality
+ollama run richardyoung/kat-dev-72b:iq2_m   # Faster, smaller
+ollama run richardyoung/kat-dev-72b:iq2_xxs # Most compact
+```
+That's it! Start asking coding questions! 🎉
+### Option 2: Build from Modelfile
+Download this repo and build locally:
+```bash
+# Clone or download the modelfiles
+ollama create kat-dev-72b-iq3_m -f modelfiles/kat-dev-72b--iq3_m.Modelfile
+ollama run kat-dev-72b-iq3_m
+```
+### Option 3: llama.cpp
+Use with llama.cpp directly:
+```bash
+# Download the GGUF file (replace variant as needed)
+huggingface-cli download richardyoung/kat-dev-72b kat-dev-72b-iq3_m.gguf --local-dir ./
+# Run with llama.cpp
+./llama-cli -m kat-dev-72b-iq3_m.gguf -p "Write a Python function to"
+```
+## 💻 System Requirements
+| Component | Minimum | Recommended |
+|-----------|---------|-------------|
+| **RAM** | 32 GB | 64 GB+ |
+| **Storage** | 40 GB free | 50+ GB free |
+| **CPU** | Modern 8-core | 16+ cores |
+| **GPU** | Optional (CPU-only works!) | Metal/CUDA for acceleration |
+| **OS** | macOS, Linux, Windows | Latest versions |
+> 💡 **Tip:** Larger quantizations (IQ4_XS) need more RAM but produce better code. Smaller ones (IQ2_XXS) are faster but less precise.
+## 🎨 Available Quantizations
+Choose the right balance for your needs:
+| Quantization | Size | Quality | Speed | RAM Usage | Best For |
+|--------------|------|---------|-------|-----------|----------|
+| **IQ4_XS** | 37 GB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ~50 GB | Production code, complex refactoring |
+| **IQ3_M** (recommended) | 33 GB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ~40 GB | Daily development, best balance |
+| **IQ2_M** | 27 GB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ~35 GB | Quick prototyping, fast iteration |
+| **IQ2_XXS** | 24 GB | ⭐⭐ | ⭐⭐⭐⭐⭐ | ~30 GB | Testing, very constrained systems |
+### Variant Details
+| Variant | Size | Blob SHA256 |
+|---------|------|-------------|
+| `iq4_xs` | 36.98 GB | `c4cb9c6e...` |
+| `iq3_m` | 33.07 GB | `14d07184...` |
+| `iq2_m` | 27.32 GB | `cbe26a3c...` |
+| `iq2_xxs` | 23.74 GB | `a49c7526...` |
+## 📚 Usage Examples
+### Code Generation
+```bash
+ollama run richardyoung/kat-dev-72b:iq3_m "Write a Python function to validate email addresses with regex"
+```
+### Code Explanation
+```bash
+ollama run richardyoung/kat-dev-72b:iq3_m "Explain this code: def fib(n): return n if n < 2 else fib(n-1) + fib(n-2)"
+```
+### Debugging Help
+```bash
+ollama run richardyoung/kat-dev-72b:iq3_m "Why does this Python code raise a KeyError?"
+```
+### Refactoring
+```bash
+ollama run richardyoung/kat-dev-72b:iq3_m "Refactor this JavaScript function to use async/await instead of callbacks"
+```
+### Multi-turn Conversation
 ```bash
 ollama run richardyoung/kat-dev-72b:iq3_m
+>>> I need to build a REST API in Python
+>>> Show me a FastAPI example with authentication
+>>> How do I add rate limiting?
+```
+## 🏗️ Model Details
+<details>
+<summary><b>Click to expand technical details</b></summary>
+### Architecture
+- **Base Model:** KAT-Dev 72B Exp by Kwaipilot
+- **Parameters:** ~72 Billion
+- **Quantization:** GGUF format (IQ2_XXS to IQ4_XS)
+- **Context Length:** Standard (check base model for specifics)
+- **Optimization:** Code generation and understanding
+- **Training:** Specialized for programming tasks
+### Supported Languages
+The model excels at:
+- Python
+- JavaScript/TypeScript
+- Java
+- C/C++
+- Go
+- Rust
+- And many more!
+</details>
+## ⚡ Performance Tips
+<details>
+<summary><b>Getting the best results</b></summary>
+1. **Choose the right quantization** - IQ3_M is recommended for daily use
+2. **Use specific prompts** - "Write a Python function to X" works better than "code for X"
+3. **Provide context** - Share error messages, file structures, or requirements
+4. **Iterate** - Ask follow-up questions to refine the code
+5. **GPU acceleration** - Use Metal (Mac) or CUDA (NVIDIA) for faster inference
+6. **Temperature settings** - Lower (0.1-0.3) for precise code, higher (0.7-0.9) for creative solutions
+### Example Ollama Configuration
+```bash
+# Create with custom parameters
+ollama create my-kat-dev -f modelfiles/kat-dev-72b--iq3_m.Modelfile
+# Edit the Modelfile to add:
+PARAMETER temperature 0.2
+PARAMETER top_p 0.9
+PARAMETER repeat_penalty 1.1
+```
+</details>
+## 🔧 Building Custom Variants
+You can modify the included Modelfiles to customize behavior:
+```dockerfile
+FROM ./kat-dev-72b-iq3_m.gguf
+# System prompt
+SYSTEM You are an expert programmer specializing in Python and web development.
+# Parameters
+PARAMETER temperature 0.2
+PARAMETER num_ctx 8192
+PARAMETER stop "<|endoftext|>"
+```
+Then build:
+```bash
+ollama create my-custom-kat -f custom.Modelfile
 ```
+## ⚠️ Known Limitations
+- 💾 **Large Size** - Even the smallest variant needs 24+ GB of storage
+- 🐏 **RAM Intensive** - Requires significant system memory
+- ⏱️ **Inference Speed** - Slower than smaller models (trade-off for quality)
+- 🌐 **English-Focused** - Best performance with English prompts
+- 📝 **Code-Specialized** - Not optimized for general conversation
+## 📄 License
+Apache 2.0 - Same as the original model. Free for commercial use!
+## 🙏 Acknowledgments
+- **Original Model:** [Kwaipilot](https://huggingface.co/Kwaipilot) for creating KAT-Dev 72B
+- **GGUF Format:** [Georgi Gerganov](https://github.com/ggerganov) for llama.cpp
+- **Ollama:** [Ollama team](https://ollama.ai/) for the amazing runtime
+- **Community:** All the developers testing and providing feedback
+## 🔗 Useful Links
+- 📦 **Original Model:** [Kwaipilot/KAT-Dev-72B-Exp](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp)
+- 🚀 **Ollama Registry:** [richardyoung/kat-dev-72b](https://ollama.com/richardyoung/kat-dev-72b)
+- 🛠️ **llama.cpp:** [GitHub](https://github.com/ggerganov/llama.cpp)
+- 📖 **Ollama Docs:** [Documentation](https://github.com/ollama/ollama)
+- 💬 **Discussions:** [Ask questions here!](https://huggingface.co/richardyoung/kat-dev-72b/discussions)
+## 🎮 Pro Tips
+<details>
+<summary><b>Advanced usage patterns</b></summary>
+### 1. Integration with VS Code
+Use with Continue.dev or other coding assistants:
+```json
+{
+  "models": [
+    {
+      "title": "KAT-Dev 72B",
+      "provider": "ollama",
+      "model": "richardyoung/kat-dev-72b:iq3_m"
+    }
+  ]
+}
+```
+### 2. API Server Mode
+Run as an OpenAI-compatible API:
+```bash
+ollama serve
+# Then use the API at http://localhost:11434
+```
+### 3. Batch Processing
+Process multiple files:
 ```bash
+for file in *.py; do
+  ollama run richardyoung/kat-dev-72b:iq3_m "Review this code: $(cat $file)" > "${file}.review"
+done
 ```
+</details>
+---
+<div align="center">
+**Quantized with ❤️ by [richardyoung](https://deepneuro.ai/richard)**
+*If you find this useful, please ⭐ star the repo and share with other developers!*
+**Format:** GGUF | **Runtime:** Ollama / llama.cpp | **Created:** October 2025
+</div>