File size: 6,281 Bytes
f8ce920 b6b7b1e f8ce920 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | ---
language:
- en
- de
license: apache-2.0
library_name: transformers
base_model:
- Qwen/Qwen2.5-Coder-14B
- Qwen/Qwen2.5-Coder-32B
tags:
- code
- coding
- tool-calling
- code-generation
- eu-trained
- dpo
- sft
- qlora
pipeline_tag: text-generation
model-index:
- name: Kode
results: []
---
# Kode β EU-Trained Coding Models
**Kode** is a family of instruction-tuned coding models built for real-world software engineering tasks. Fine-tuned on **Qwen2.5-Coder** using DPO + SFT with Claude-generated training samples on A100 GPUs.
Kode is the backbone of Kode CLI/Web UI, an open-source local alternative to Claude Code. Github coming soon.
| Model | Parameters | VRAM | Best For |
|-------|-----------|------|----------|
| **kode-14b** | 14B | ~10 GB (Q8) / ~9 GB (Q4) | Consumer GPUs, fast iteration |
| **kode-32b** | 32B | ~19 GB (Q4) | Maximum quality, production use |
## Key Features
- πͺπΊ **Trained in the EU** β DSGVO/GDPR compliant, no data leaves Europe
- π§ **Tool-calling native** β Trained specifically for file operations, shell commands, code search
- π― **Production code focus** β Training data from real codebases, not synthetic benchmarks
- π **7 languages** β Rust, Go, TypeScript, Python, C#, SQL, CSS/Tailwind
- π **Runs locally** β 14B fits on a single consumer GPU (RTX 3080+)
## Supported Languages & Tasks
### Languages
Rust β’ Go β’ TypeScript β’ Python β’ C# β’ PostgreSQL β’ CSS/Tailwind
### Tasks
- **Code generation** β Complete functions, modules, and files from natural language
- **Code refactoring** β Improve existing code structure and performance
- **Code review** β Identify bugs, security issues, and improvements
- **Tool calling** β File I/O, shell commands, grep/search (Kode CLI integration)
- **Code completion** β Context-aware completions
## Training Details
### Base Model
[Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B) (14B and 32B variants)
### Training Pipeline
1. **SFT (Supervised Fine-Tuning)** β Claude-generated training samples across 7 languages (~841 curated queries covering data structures, async, error handling, APIs, testing, and more)
2. **DPO (Direct Preference Optimization)** β Preference pairs from Claude evaluations of model outputs
3. **Tool-call SFT** β Specialized training for tool-calling patterns (read_file, write_file, bash_execute, grep, etc.)
### Infrastructure
- **GPU:** NVIDIA A100 80GB (2Γ for 32B full fine-tune, 1Γ for QLoRA)
- **Framework:** Transformers + PEFT + TRL + Unsloth
- **LoRA config (32B):** r=64, alpha=128, dropout=0.05, targeting all attention + MLP projections
- **Precision:** bfloat16
- **Sequence length:** 4096 tokens
### Training Data
- ~841 curated training queries across 7 programming languages
- Claude-generated reference solutions (chosen) vs. local model outputs (rejected) for DPO
- Bilingual prompts (English + German)
## Usage
### Ollama (Recommended)
```bash
# Install and run
ollama pull simplellm/kode-14b
ollama run simplellm/kode-14b
# Or the larger model
ollama pull simplellm/kode-32b
ollama run simplellm/kode-32b
```
### Ollama API
```bash
curl http://localhost:11434/api/chat -d '{
"model": "simplellm/kode-14b",
"messages": [
{"role": "user", "content": "Write a Rust function to find prime numbers using the Sieve of Eratosthenes"}
]
}'
```
### π€ Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "simplellm/kode-14b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "You are a coding assistant. Respond with clean, production-ready code."},
{"role": "user", "content": "Write a thread-safe LRU cache in Rust using Arc and Mutex"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```
### llama.cpp
```bash
# Download GGUF
wget https://huggingface.co/simplellm/kode-14b-GGUF/resolve/main/kode-14b-Q8_0.gguf
# Run
./llama-cli -m kode-14b-Q8_0.gguf -p "Write a Go HTTP server with middleware" -n 1024
```
### Hosted Inference
Try Kode without downloading at **[SimpleLLM.eu](https://simplellm.eu)** β EU-hosted, GDPR-compliant inference API.
## Quantized Versions
| Variant | Size | Quality | Speed |
|---------|------|---------|-------|
| kode-14b (FP16) | ~28 GB | Baseline | Baseline |
| kode-14b-Q8 | ~15 GB | Near-lossless | ~1.2Γ faster |
| kode-14b (Q4) | ~9 GB | Good | ~1.5Γ faster |
| kode-32b (native/FP16) | ~64 GB | Best | Slowest |
| kode-32b-Q4 | ~19 GB | Very good | Fast |
## Benchmarks
> π§ **Coming soon** β We are running HumanEval, MBPP, MultiPL-E, and tool-calling benchmarks. Results will be published here.
| Benchmark | kode-14b | kode-32b | Qwen2.5-Coder-14B (base) |
|-----------|----------|----------|--------------------------|
| HumanEval | TBD | TBD | TBD |
| MBPP | TBD | TBD | TBD |
| MultiPL-E (Rust) | TBD | TBD | TBD |
| Tool-call accuracy | TBD | TBD | N/A |
## Limitations
- Optimized for the 7 supported languages; may underperform on others
- 4096 token context window (inherited from training config)
- Tool-calling format is specific to Kode CLI's tool schema
- Training data is bilingual (EN/DE) β other languages may have reduced quality
## License
Apache 2.0 (inherited from [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B))
## Citation
```bibtex
@misc{kode2025,
title={Kode: EU-Trained Coding Models for Real-World Software Engineering},
author={Kevin and SimpleLLM Team},
year={2025},
url={https://huggingface.co/simplellm/kode-14b}
}
```
## Links
- π [SimpleLLM.eu](https://simplellm.eu) β Hosted inference
- π» [Kode CLI](https://github.com/kevco/kode) β Local coding assistant
- π€ [All models](https://huggingface.co/simplellm) β HuggingFace collection
|