| | --- |
| | language: |
| | - en |
| | - de |
| | license: apache-2.0 |
| | library_name: transformers |
| | base_model: |
| | - Qwen/Qwen2.5-Coder-14B |
| | - Qwen/Qwen2.5-Coder-32B |
| | tags: |
| | - code |
| | - coding |
| | - tool-calling |
| | - code-generation |
| | - eu-trained |
| | - dpo |
| | - sft |
| | - qlora |
| | pipeline_tag: text-generation |
| | model-index: |
| | - name: Kode |
| | results: [] |
| | --- |
| | |
| | # Kode β EU-Trained Coding Models |
| |
|
| | **Kode** is a family of instruction-tuned coding models built for real-world software engineering tasks. Fine-tuned on **Qwen2.5-Coder** using DPO + SFT with Claude-generated training samples on A100 GPUs. |
| |
|
| | Kode is the backbone of Kode CLI/Web UI, an open-source local alternative to Claude Code. Github coming soon. |
| |
|
| | | Model | Parameters | VRAM | Best For | |
| | |-------|-----------|------|----------| |
| | | **kode-14b** | 14B | ~10 GB (Q8) / ~9 GB (Q4) | Consumer GPUs, fast iteration | |
| | | **kode-32b** | 32B | ~19 GB (Q4) | Maximum quality, production use | |
| |
|
| | ## Key Features |
| |
|
| | - πͺπΊ **Trained in the EU** β DSGVO/GDPR compliant, no data leaves Europe |
| | - π§ **Tool-calling native** β Trained specifically for file operations, shell commands, code search |
| | - π― **Production code focus** β Training data from real codebases, not synthetic benchmarks |
| | - π **7 languages** β Rust, Go, TypeScript, Python, C#, SQL, CSS/Tailwind |
| | - π **Runs locally** β 14B fits on a single consumer GPU (RTX 3080+) |
| |
|
| | ## Supported Languages & Tasks |
| |
|
| | ### Languages |
| | Rust β’ Go β’ TypeScript β’ Python β’ C# β’ PostgreSQL β’ CSS/Tailwind |
| |
|
| | ### Tasks |
| | - **Code generation** β Complete functions, modules, and files from natural language |
| | - **Code refactoring** β Improve existing code structure and performance |
| | - **Code review** β Identify bugs, security issues, and improvements |
| | - **Tool calling** β File I/O, shell commands, grep/search (Kode CLI integration) |
| | - **Code completion** β Context-aware completions |
| |
|
| | ## Training Details |
| |
|
| | ### Base Model |
| | [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B) (14B and 32B variants) |
| |
|
| | ### Training Pipeline |
| | 1. **SFT (Supervised Fine-Tuning)** β Claude-generated training samples across 7 languages (~841 curated queries covering data structures, async, error handling, APIs, testing, and more) |
| | 2. **DPO (Direct Preference Optimization)** β Preference pairs from Claude evaluations of model outputs |
| | 3. **Tool-call SFT** β Specialized training for tool-calling patterns (read_file, write_file, bash_execute, grep, etc.) |
| | |
| | ### Infrastructure |
| | - **GPU:** NVIDIA A100 80GB (2Γ for 32B full fine-tune, 1Γ for QLoRA) |
| | - **Framework:** Transformers + PEFT + TRL + Unsloth |
| | - **LoRA config (32B):** r=64, alpha=128, dropout=0.05, targeting all attention + MLP projections |
| | - **Precision:** bfloat16 |
| | - **Sequence length:** 4096 tokens |
| | |
| | ### Training Data |
| | - ~841 curated training queries across 7 programming languages |
| | - Claude-generated reference solutions (chosen) vs. local model outputs (rejected) for DPO |
| | - Bilingual prompts (English + German) |
| | |
| | ## Usage |
| | |
| | ### Ollama (Recommended) |
| | |
| | ```bash |
| | # Install and run |
| | ollama pull simplellm/kode-14b |
| | ollama run simplellm/kode-14b |
| | |
| | # Or the larger model |
| | ollama pull simplellm/kode-32b |
| | ollama run simplellm/kode-32b |
| | ``` |
| | |
| | ### Ollama API |
| | |
| | ```bash |
| | curl http://localhost:11434/api/chat -d '{ |
| | "model": "simplellm/kode-14b", |
| | "messages": [ |
| | {"role": "user", "content": "Write a Rust function to find prime numbers using the Sieve of Eratosthenes"} |
| | ] |
| | }' |
| | ``` |
| | |
| | ### π€ Transformers |
| | |
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model_name = "simplellm/kode-14b" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name, |
| | torch_dtype="auto", |
| | device_map="auto", |
| | trust_remote_code=True, |
| | ) |
| | |
| | messages = [ |
| | {"role": "system", "content": "You are a coding assistant. Respond with clean, production-ready code."}, |
| | {"role": "user", "content": "Write a thread-safe LRU cache in Rust using Arc and Mutex"}, |
| | ] |
| | |
| | text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| | inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9) |
| | print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) |
| | ``` |
| | |
| | ### llama.cpp |
| | |
| | ```bash |
| | # Download GGUF |
| | wget https://huggingface.co/simplellm/kode-14b-GGUF/resolve/main/kode-14b-Q8_0.gguf |
| | |
| | # Run |
| | ./llama-cli -m kode-14b-Q8_0.gguf -p "Write a Go HTTP server with middleware" -n 1024 |
| | ``` |
| | |
| | ### Hosted Inference |
| | |
| | Try Kode without downloading at **[SimpleLLM.eu](https://simplellm.eu)** β EU-hosted, GDPR-compliant inference API. |
| | |
| | ## Quantized Versions |
| | |
| | | Variant | Size | Quality | Speed | |
| | |---------|------|---------|-------| |
| | | kode-14b (FP16) | ~28 GB | Baseline | Baseline | |
| | | kode-14b-Q8 | ~15 GB | Near-lossless | ~1.2Γ faster | |
| | | kode-14b (Q4) | ~9 GB | Good | ~1.5Γ faster | |
| | | kode-32b (native/FP16) | ~64 GB | Best | Slowest | |
| | | kode-32b-Q4 | ~19 GB | Very good | Fast | |
| | |
| | ## Benchmarks |
| | |
| | > π§ **Coming soon** β We are running HumanEval, MBPP, MultiPL-E, and tool-calling benchmarks. Results will be published here. |
| | |
| | | Benchmark | kode-14b | kode-32b | Qwen2.5-Coder-14B (base) | |
| | |-----------|----------|----------|--------------------------| |
| | | HumanEval | TBD | TBD | TBD | |
| | | MBPP | TBD | TBD | TBD | |
| | | MultiPL-E (Rust) | TBD | TBD | TBD | |
| | | Tool-call accuracy | TBD | TBD | N/A | |
| | |
| | ## Limitations |
| | |
| | - Optimized for the 7 supported languages; may underperform on others |
| | - 4096 token context window (inherited from training config) |
| | - Tool-calling format is specific to Kode CLI's tool schema |
| | - Training data is bilingual (EN/DE) β other languages may have reduced quality |
| | |
| | ## License |
| | |
| | Apache 2.0 (inherited from [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B)) |
| | |
| | ## Citation |
| | |
| | ```bibtex |
| | @misc{kode2025, |
| | title={Kode: EU-Trained Coding Models for Real-World Software Engineering}, |
| | author={Kevin and SimpleLLM Team}, |
| | year={2025}, |
| | url={https://huggingface.co/simplellm/kode-14b} |
| | } |
| | ``` |
| | |
| | ## Links |
| | |
| | - π [SimpleLLM.eu](https://simplellm.eu) β Hosted inference |
| | - π» [Kode CLI](https://github.com/kevco/kode) β Local coding assistant |
| | - π€ [All models](https://huggingface.co/simplellm) β HuggingFace collection |
| | |