--- language: - en - de license: apache-2.0 library_name: transformers base_model: - Qwen/Qwen2.5-Coder-14B - Qwen/Qwen2.5-Coder-32B tags: - code - coding - tool-calling - code-generation - eu-trained - dpo - sft - qlora pipeline_tag: text-generation model-index: - name: Kode results: [] --- # Kode — EU-Trained Coding Models **Kode** is a family of instruction-tuned coding models built for real-world software engineering tasks. Fine-tuned on **Qwen2.5-Coder** using DPO + SFT with Claude-generated training samples on A100 GPUs. Kode is the backbone of Kode CLI/Web UI, an open-source local alternative to Claude Code. Github coming soon. | Model | Parameters | VRAM | Best For | |-------|-----------|------|----------| | **kode-14b** | 14B | ~10 GB (Q8) / ~9 GB (Q4) | Consumer GPUs, fast iteration | | **kode-32b** | 32B | ~19 GB (Q4) | Maximum quality, production use | ## Key Features - 🇪🇺 **Trained in the EU** — DSGVO/GDPR compliant, no data leaves Europe - 🔧 **Tool-calling native** — Trained specifically for file operations, shell commands, code search - 🎯 **Production code focus** — Training data from real codebases, not synthetic benchmarks - 📐 **7 languages** — Rust, Go, TypeScript, Python, C#, SQL, CSS/Tailwind - 🏠 **Runs locally** — 14B fits on a single consumer GPU (RTX 3080+) ## Supported Languages & Tasks ### Languages Rust • Go • TypeScript • Python • C# • PostgreSQL • CSS/Tailwind ### Tasks - **Code generation** — Complete functions, modules, and files from natural language - **Code refactoring** — Improve existing code structure and performance - **Code review** — Identify bugs, security issues, and improvements - **Tool calling** — File I/O, shell commands, grep/search (Kode CLI integration) - **Code completion** — Context-aware completions ## Training Details ### Base Model [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B) (14B and 32B variants) ### Training Pipeline 1. **SFT (Supervised Fine-Tuning)** — Claude-generated training samples across 7 languages (~841 curated queries covering data structures, async, error handling, APIs, testing, and more) 2. **DPO (Direct Preference Optimization)** — Preference pairs from Claude evaluations of model outputs 3. **Tool-call SFT** — Specialized training for tool-calling patterns (read_file, write_file, bash_execute, grep, etc.) ### Infrastructure - **GPU:** NVIDIA A100 80GB (2× for 32B full fine-tune, 1× for QLoRA) - **Framework:** Transformers + PEFT + TRL + Unsloth - **LoRA config (32B):** r=64, alpha=128, dropout=0.05, targeting all attention + MLP projections - **Precision:** bfloat16 - **Sequence length:** 4096 tokens ### Training Data - ~841 curated training queries across 7 programming languages - Claude-generated reference solutions (chosen) vs. local model outputs (rejected) for DPO - Bilingual prompts (English + German) ## Usage ### Ollama (Recommended) ```bash # Install and run ollama pull simplellm/kode-14b ollama run simplellm/kode-14b # Or the larger model ollama pull simplellm/kode-32b ollama run simplellm/kode-32b ``` ### Ollama API ```bash curl http://localhost:11434/api/chat -d '{ "model": "simplellm/kode-14b", "messages": [ {"role": "user", "content": "Write a Rust function to find prime numbers using the Sieve of Eratosthenes"} ] }' ``` ### 🤗 Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "simplellm/kode-14b" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto", trust_remote_code=True, ) messages = [ {"role": "system", "content": "You are a coding assistant. Respond with clean, production-ready code."}, {"role": "user", "content": "Write a thread-safe LRU cache in Rust using Arc and Mutex"}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.9) print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) ``` ### llama.cpp ```bash # Download GGUF wget https://huggingface.co/simplellm/kode-14b-GGUF/resolve/main/kode-14b-Q8_0.gguf # Run ./llama-cli -m kode-14b-Q8_0.gguf -p "Write a Go HTTP server with middleware" -n 1024 ``` ### Hosted Inference Try Kode without downloading at **[SimpleLLM.eu](https://simplellm.eu)** — EU-hosted, GDPR-compliant inference API. ## Quantized Versions | Variant | Size | Quality | Speed | |---------|------|---------|-------| | kode-14b (FP16) | ~28 GB | Baseline | Baseline | | kode-14b-Q8 | ~15 GB | Near-lossless | ~1.2× faster | | kode-14b (Q4) | ~9 GB | Good | ~1.5× faster | | kode-32b (native/FP16) | ~64 GB | Best | Slowest | | kode-32b-Q4 | ~19 GB | Very good | Fast | ## Benchmarks > 🚧 **Coming soon** — We are running HumanEval, MBPP, MultiPL-E, and tool-calling benchmarks. Results will be published here. | Benchmark | kode-14b | kode-32b | Qwen2.5-Coder-14B (base) | |-----------|----------|----------|--------------------------| | HumanEval | TBD | TBD | TBD | | MBPP | TBD | TBD | TBD | | MultiPL-E (Rust) | TBD | TBD | TBD | | Tool-call accuracy | TBD | TBD | N/A | ## Limitations - Optimized for the 7 supported languages; may underperform on others - 4096 token context window (inherited from training config) - Tool-calling format is specific to Kode CLI's tool schema - Training data is bilingual (EN/DE) — other languages may have reduced quality ## License Apache 2.0 (inherited from [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B)) ## Citation ```bibtex @misc{kode2025, title={Kode: EU-Trained Coding Models for Real-World Software Engineering}, author={Kevin and SimpleLLM Team}, year={2025}, url={https://huggingface.co/simplellm/kode-14b} } ``` ## Links - 🌐 [SimpleLLM.eu](https://simplellm.eu) — Hosted inference - 💻 [Kode CLI](https://github.com/kevco/kode) — Local coding assistant - 🤗 [All models](https://huggingface.co/simplellm) — HuggingFace collection