--- language: - en license: mit library_name: llama-cpp-python pipeline_tag: text-generation tags: - code-generation - coding-assistant - gguf - llama.cpp - qwen2.5 - python - javascript - fine-tuned - lora - peft base_model: - Qwen/Qwen2.5-1.5B-Instruct - Qwen/Qwen2.5-0.5B-Instruct --- # BlitzKode **BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It ships as a **GGUF model** (1.5B, F16, ~3 GB) for fast offline inference with llama.cpp, and as a **LoRA adapter** (0.5B, ~100 MB) for PEFT-based research and further fine-tuning. > **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker) > **GitHub:** > **GGUF model:** [`neuralbroker/blitzkode`](https://huggingface.co/neuralbroker/blitzkode) > **LoRA adapter:** [`neuralbroker/blitzkode-lora-0.5b`](https://huggingface.co/neuralbroker/blitzkode-lora-0.5b) --- ## Model Variants | Variant | Version | Base Model | Format | Size | Runtime | |---|---|---|---|---|---| | **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF F16 | ~3 GB | llama.cpp / llama-cpp-python | | **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers | --- ## Architecture | Property | GGUF (1.5B) | LoRA Adapter (0.5B) | |---|---|---| | **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA | | **Parameters** | 1.5 B | 0.5 B + adapter weights | | **Quantization** | GGUF F16 | bfloat16 / float16 | | **LoRA rank (r)** | — | 16 | | **LoRA alpha** | — | 32 | | **LoRA target modules** | — | q, k, v, o, gate, up, down projections | | **Context window** | 2 048 tokens | 2 048 tokens | | **Vocabulary** | 151 936 | 151 936 | --- ## Training Pipeline BlitzKode was produced by a **4-stage fine-tuning pipeline**: ### Stage 1 — SFT (Supervised Fine-Tuning) LoRA fine-tuning (`r=32`, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic coding problems covering arrays, strings, trees, dynamic programming, graphs, sorting, hash tables, binary search, and more. - **Adapter checkpoint:** `checkpoints/sft-1.5b-v1/` - **Library:** PEFT + HuggingFace Transformers ### Stage 2 — Reward-SFT Continued SFT with heuristic reward functions to reinforce code correctness, formatting quality, and concise explanation style. This is a standard SFT training loop using scalar reward signals, **not** full GRPO. - **Adapter checkpoint:** `checkpoints/grpo-v1/` *(label is historical)* - **Library:** TRL / Transformers ### Stage 3 — DPO (Direct Preference Optimization) Preference optimization on handcrafted chosen/rejected pairs to improve answer clarity, reduce verbosity, and penalize hallucinated APIs or filenames. - **Adapter checkpoint:** `checkpoints/dpo-v1/` - **Library:** TRL ### Stage 4 — Continued LoRA SFT (Published Adapter) Final LoRA fine-tuning (`r=16`, base: **Qwen2.5-0.5B-Instruct**) on 99 samples drawn from the 199-sample full dataset. Training ran for 50 steps; final loss reached **~0.48**. - **Adapter checkpoint:** `checkpoints/available-lora-0.5b-full/final` ✅ *(publicly available)* - **Library:** PEFT + Transformers ### Stage 5 — Merge & Export (GGUF) LoRA adapters from Stage 1–3 were merged into the 1.5B base model using `merge_and_unload()`, then converted to GGUF F16 format with llama.cpp. - **Script:** `scripts/export_gguf.py` - **Artifact:** `blitzkode.gguf` (~3 GB, git-ignored) --- ## Training Data **Total: 199 samples across 3 subsets** | Subset | Count | Source | License | Purpose | |---|---|---|---|---| | Curated algorithmic problems | 71 | Custom (local) | MIT | Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching | | MetaMathQA samples | 100 | [`meta-math/MetaMathQA`](https://huggingface.co/datasets/meta-math/MetaMathQA) | CC BY 4.0 | Math reasoning transfer to improve step-by-step problem solving | | Python/JavaScript patterns | 28 | Custom (local) | MIT | Practical patterns: decorators, context managers, data classes, async, CLI tools | | **Total** | **199** | | | | See [`datasets/MANIFEST.md`](datasets/MANIFEST.md) for full dataset provenance, preprocessing notes, and per-sample license details. --- ## Features - **Multi-language code generation** — Python, JavaScript, Java, C++, TypeScript, SQL - **Code explanation** — clear inline comments and documentation - **Bug fixing** — debug and fix common code issues - **Algorithm assistance** — data structures and algorithms (LeetCode-style) - **Offline operation** — fully local, no internet required at inference time - **Fast CPU inference** — GGUF F16 runs on commodity CPUs - **Modern web UI** — React/Vite chat interface with SSE streaming - **REST API** — FastAPI backend with streaming and optional web-search augmentation --- ## Usage ### Production: GGUF with llama.cpp ```bash # Clone and install git clone https://github.com/neuralbroker/blitzkode cd blitzkode pip install -r requirements.txt # Build the frontend cd frontend && npm install && npm run build && cd .. # Start the server (place blitzkode.gguf in repo root first) python server.py # Open http://localhost:7860 ``` ### Research: LoRA Adapter with PEFT ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_model_id = "Qwen/Qwen2.5-0.5B-Instruct" adapter_repo = "neuralbroker/blitzkode-lora-0.5b" tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True ) model = PeftModel.from_pretrained(model, adapter_repo) model.eval() ``` ### Prompt Format (ChatML) All variants use the Qwen ChatML template: ``` <|im_start|>system You are BlitzKode, an AI coding assistant created by Sajad. You are an expert in Python, JavaScript, Java, C++, and other languages. Write clean, efficient, and well-documented code. Keep responses concise and practical.<|im_end|> <|im_start|>user {your prompt}<|im_end|> <|im_start|>assistant ``` --- ## Intended Use ### Best For - Local offline coding assistance - Algorithm and data structure problem solving - Code generation and explanation - Educational programming support - Code review, refactoring, and debugging ### Out of Scope - Production code without thorough expert review - Security-critical or cryptographic applications - Multi-modal tasks (images not supported) - Long-context repository analysis (> 2 048 tokens) --- ## Limitations - **Text-only input** — no image or file-upload support - **2 048-token context** — CPU-friendly but limits long conversation history - **Verify all outputs** — always review and test generated code - **Small model** — 0.5B–1.5B scale; may produce incorrect code on complex tasks - **No real-time data** — knowledge cutoff follows the Qwen2.5 base model - **Math reasoning** — MetaMathQA transfer helps basic reasoning; not a math specialist --- ## Environment Variables (Inference Server) | Variable | Default | Description | |---|---|---| | `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU | | `BLITZKODE_THREADS` | system | CPU inference thread count | | `BLITZKODE_N_CTX` | `2048` | Context window size | | `BLITZKODE_BATCH` | `512` | llama.cpp batch size | | `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request | --- ## Project Structure ```text BlitzKode/ server.py # FastAPI backend (inference + search) blitzkode.gguf # GGUF model artifact (~3 GB, git-ignored) frontend/ # React/Vite web UI scripts/ train_sft.py # Stage 1: SFT training train_reward_sft.py # Stage 2: Reward-SFT train_dpo.py # Stage 3: DPO train_available.py # Stage 4: LoRA fine-tune (0.5B) export_gguf.py # Merge & convert to GGUF push_to_hub.py # Push adapter to HuggingFace Hub build_full_dataset.py # Dataset builder (algorithmic + HF datasets) datasets/ MANIFEST.md # Dataset provenance and license info checkpoints/ available-lora-0.5b-full/ # Published LoRA adapter (0.5B) tests/ test_server.py # HTTP integration tests docs/ PROJECT_OVERVIEW.md # Architecture and design notes README.md # Full project documentation MODEL_CARD.md # This file ``` --- ## License **MIT** — see [LICENSE](https://github.com/neuralbroker/blitzkode/blob/main/LICENSE). You must also comply with the upstream Qwen2.5 license when redistributing any fine-tuned weights derived from it. - [Qwen2.5-0.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) - [Qwen2.5-1.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) Training data subsets carry their own licenses: - MetaMathQA: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) - Custom/local samples: MIT --- ## Contact - **GitHub Issues:** - **Portfolio:** Contributions and feedback are welcome! --- ## Citation ```bibtex @software{blitzkode2025, author = {Sajad}, title = {BlitzKode: A Local AI Coding Assistant}, year = {2025}, url = {https://github.com/neuralbroker/blitzkode} } ```