File size: 6,281 Bytes
f8ce920
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b6b7b1e
f8ce920
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
language:
  - en
  - de
license: apache-2.0
library_name: transformers
base_model:
  - Qwen/Qwen2.5-Coder-14B
  - Qwen/Qwen2.5-Coder-32B
tags:
  - code
  - coding
  - tool-calling
  - code-generation
  - eu-trained
  - dpo
  - sft
  - qlora
pipeline_tag: text-generation
model-index:
  - name: Kode
    results: []
---

# Kode β€” EU-Trained Coding Models

**Kode** is a family of instruction-tuned coding models built for real-world software engineering tasks. Fine-tuned on **Qwen2.5-Coder** using DPO + SFT with Claude-generated training samples on A100 GPUs.

Kode is the backbone of Kode CLI/Web UI, an open-source local alternative to Claude Code. Github coming soon.

| Model | Parameters | VRAM | Best For |
|-------|-----------|------|----------|
| **kode-14b** | 14B | ~10 GB (Q8) / ~9 GB (Q4) | Consumer GPUs, fast iteration |
| **kode-32b** | 32B | ~19 GB (Q4) | Maximum quality, production use |

## Key Features

- πŸ‡ͺπŸ‡Ί **Trained in the EU** β€” DSGVO/GDPR compliant, no data leaves Europe
- πŸ”§ **Tool-calling native** β€” Trained specifically for file operations, shell commands, code search
- 🎯 **Production code focus** β€” Training data from real codebases, not synthetic benchmarks
- πŸ“ **7 languages** β€” Rust, Go, TypeScript, Python, C#, SQL, CSS/Tailwind
- 🏠 **Runs locally** β€” 14B fits on a single consumer GPU (RTX 3080+)

## Supported Languages & Tasks

### Languages
Rust β€’ Go β€’ TypeScript β€’ Python β€’ C# β€’ PostgreSQL β€’ CSS/Tailwind

### Tasks
- **Code generation** β€” Complete functions, modules, and files from natural language
- **Code refactoring** β€” Improve existing code structure and performance
- **Code review** β€” Identify bugs, security issues, and improvements
- **Tool calling** β€” File I/O, shell commands, grep/search (Kode CLI integration)
- **Code completion** β€” Context-aware completions

## Training Details

### Base Model
[Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B) (14B and 32B variants)

### Training Pipeline
1. **SFT (Supervised Fine-Tuning)** β€” Claude-generated training samples across 7 languages (~841 curated queries covering data structures, async, error handling, APIs, testing, and more)
2. **DPO (Direct Preference Optimization)** β€” Preference pairs from Claude evaluations of model outputs
3. **Tool-call SFT** β€” Specialized training for tool-calling patterns (read_file, write_file, bash_execute, grep, etc.)

### Infrastructure
- **GPU:** NVIDIA A100 80GB (2Γ— for 32B full fine-tune, 1Γ— for QLoRA)
- **Framework:** Transformers + PEFT + TRL + Unsloth
- **LoRA config (32B):** r=64, alpha=128, dropout=0.05, targeting all attention + MLP projections
- **Precision:** bfloat16
- **Sequence length:** 4096 tokens

### Training Data
- ~841 curated training queries across 7 programming languages
- Claude-generated reference solutions (chosen) vs. local model outputs (rejected) for DPO
- Bilingual prompts (English + German)

## Usage

### Ollama (Recommended)

```bash
# Install and run
ollama pull simplellm/kode-14b
ollama run simplellm/kode-14b

# Or the larger model
ollama pull simplellm/kode-32b
ollama run simplellm/kode-32b
```

### Ollama API

```bash
curl http://localhost:11434/api/chat -d '{
  "model": "simplellm/kode-14b",
  "messages": [
    {"role": "user", "content": "Write a Rust function to find prime numbers using the Sieve of Eratosthenes"}
  ]
}'
```

### πŸ€— Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "simplellm/kode-14b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "You are a coding assistant. Respond with clean, production-ready code."},
    {"role": "user", "content": "Write a thread-safe LRU cache in Rust using Arc and Mutex"},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```

### llama.cpp

```bash
# Download GGUF
wget https://huggingface.co/simplellm/kode-14b-GGUF/resolve/main/kode-14b-Q8_0.gguf

# Run
./llama-cli -m kode-14b-Q8_0.gguf -p "Write a Go HTTP server with middleware" -n 1024
```

### Hosted Inference

Try Kode without downloading at **[SimpleLLM.eu](https://simplellm.eu)** β€” EU-hosted, GDPR-compliant inference API.

## Quantized Versions

| Variant | Size | Quality | Speed |
|---------|------|---------|-------|
| kode-14b (FP16) | ~28 GB | Baseline | Baseline |
| kode-14b-Q8 | ~15 GB | Near-lossless | ~1.2Γ— faster |
| kode-14b (Q4) | ~9 GB | Good | ~1.5Γ— faster |
| kode-32b (native/FP16) | ~64 GB | Best | Slowest |
| kode-32b-Q4 | ~19 GB | Very good | Fast |

## Benchmarks

> 🚧 **Coming soon** β€” We are running HumanEval, MBPP, MultiPL-E, and tool-calling benchmarks. Results will be published here.

| Benchmark | kode-14b | kode-32b | Qwen2.5-Coder-14B (base) |
|-----------|----------|----------|--------------------------|
| HumanEval | TBD | TBD | TBD |
| MBPP | TBD | TBD | TBD |
| MultiPL-E (Rust) | TBD | TBD | TBD |
| Tool-call accuracy | TBD | TBD | N/A |

## Limitations

- Optimized for the 7 supported languages; may underperform on others
- 4096 token context window (inherited from training config)
- Tool-calling format is specific to Kode CLI's tool schema
- Training data is bilingual (EN/DE) β€” other languages may have reduced quality

## License

Apache 2.0 (inherited from [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B))

## Citation

```bibtex
@misc{kode2025,
  title={Kode: EU-Trained Coding Models for Real-World Software Engineering},
  author={Kevin and SimpleLLM Team},
  year={2025},
  url={https://huggingface.co/simplellm/kode-14b}
}
```

## Links

- 🌐 [SimpleLLM.eu](https://simplellm.eu) β€” Hosted inference
- πŸ’» [Kode CLI](https://github.com/kevco/kode) β€” Local coding assistant
- πŸ€— [All models](https://huggingface.co/simplellm) β€” HuggingFace collection