--- language: - en license: mit library_name: llama-cpp-python pipeline_tag: text-generation tags: - code-generation - coding-assistant - gguf - llama.cpp - qwen2.5 - python - javascript - fine-tuned - lora - peft base_model: - Qwen/Qwen2.5-1.5B-Instruct - Qwen/Qwen2.5-0.5B-Instruct --- # BlitzKode **BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It ships as a **GGUF model** (1.5B, Q8_0, ~1.53 GB) for fast offline inference with llama.cpp, and as **LoRA adapters** for PEFT-based research and further fine-tuning. > **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker) > **GitHub:** > **GGUF model:** [`neuralbroker/blitzkode`](https://huggingface.co/neuralbroker/blitzkode) > **LoRA adapter:** [`neuralbroker/blitzkode-lora-0.5b`](https://huggingface.co/neuralbroker/blitzkode-lora-0.5b) --- ## Model Variants | Variant | Version | Base Model | Format | Size | Runtime | |---|---|---|---|---|---| | **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF Q8_0 | ~1.53 GB | llama.cpp / llama-cpp-python | | **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers | --- ## Architecture | Property | GGUF (1.5B) | LoRA Adapter (0.5B) | |---|---|---| | **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA | | **Parameters** | 1.5 B | 0.5 B + adapter weights | | **Quantization** | GGUF Q8_0 | bfloat16 / float16 | | **LoRA rank (r)** | — | 16 | | **LoRA alpha** | — | 32 | | **LoRA target modules** | — | q, k, v, o, gate, up, down projections | | **Context window** | 2 048 tokens | 2 048 tokens | | **Vocabulary** | 151 936 | 151 936 | --- ## Training Pipeline BlitzKode was produced by a **4-stage fine-tuning pipeline**: ### Stage 1 — SFT (Supervised Fine-Tuning) LoRA fine-tuning (`r=32`, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic coding problems covering arrays, strings, trees, dynamic programming, graphs, sorting, hash tables, binary search, and more. - **Adapter checkpoint:** `checkpoints/sft-1.5b-v1/` - **Library:** PEFT + HuggingFace Transformers ### Stage 2 — Reward-SFT Continued SFT with heuristic reward functions to reinforce code correctness, formatting quality, and concise explanation style. This is a standard SFT training loop using scalar reward signals, **not** full GRPO. - **Adapter checkpoint:** `checkpoints/grpo-v1/` *(label is historical)* - **Library:** TRL / Transformers ### Stage 3 — DPO (Direct Preference Optimization) Preference optimization on handcrafted chosen/rejected pairs to improve answer clarity, reduce verbosity, and penalize hallucinated APIs or filenames. - **Adapter checkpoint:** `checkpoints/dpo-v1/` - **Library:** TRL ### Stage 4 — Continued LoRA SFT (Published Adapter) Final LoRA fine-tuning (`r=16`, base: **Qwen2.5-0.5B-Instruct**) on 99 samples drawn from the 199-sample full dataset. Training ran for 50 steps; final loss reached **~0.48**. - **Adapter checkpoint:** `checkpoints/available-lora-0.5b-full/final` ✅ *(publicly available)* - **Library:** PEFT + Transformers ### Stage 5 — Merge & Export (GGUF) LoRA adapters from Stage 1–3 were merged into the 1.5B base model using `merge_and_unload()`, then converted to GGUF Q8_0 format with llama.cpp. - **Script:** `scripts/export_gguf.py` - **Artifact:** `blitzkode.gguf` (~1.53 GB, git-ignored) --- ## Training Data **Total: 199 samples across 3 subsets** | Subset | Count | Source | License | Purpose | |---|---|---|---|---| | Curated algorithmic problems | 71 | Custom (local) | MIT | Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching | | MetaMathQA samples | 100 | [`meta-math/MetaMathQA`](https://huggingface.co/datasets/meta-math/MetaMathQA) | CC BY 4.0 | Math reasoning transfer to improve step-by-step problem solving | | Python/JavaScript patterns | 28 | Custom (local) | MIT | Practical patterns: decorators, context managers, data classes, async, CLI tools | | **Total** | **199** | | | | See [`datasets/MANIFEST.md`](datasets/MANIFEST.md) for full dataset provenance, preprocessing notes, and per-sample license details. --- ## Features - **Multi-language code generation** — Python, JavaScript, Java, C++, TypeScript, SQL - **Code explanation** — clear inline comments and documentation - **Bug fixing** — debug and fix common code issues - **Algorithm assistance** — data structures and algorithms (LeetCode-style) - **Offline operation** — fully local, no internet required at inference time - **Fast CPU inference** — GGUF F16 runs on commodity CPUs - **API-first serving** — FastAPI backend with REST and SSE streaming endpoints - **Optimized local inference** — configurable llama.cpp GPU offload, mmap loading, batching, and prompt cache --- ## Usage ### Production: GGUF with llama.cpp ```bash # Clone and install git clone https://github.com/neuralbroker/blitzkode cd blitzkode pip install -r requirements.txt # Start the API server (place blitzkode.gguf in repo root first) python server.py curl http://localhost:7860/health ``` ### Research: LoRA Adapter with PEFT ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_model_id = "Qwen/Qwen2.5-0.5B-Instruct" adapter_repo = "neuralbroker/blitzkode-lora-0.5b" tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True ) model = PeftModel.from_pretrained(model, adapter_repo) model.eval() ``` ### Prompt Format (ChatML) All variants use the Qwen ChatML template: ``` <|im_start|>system You are BlitzKode, an AI coding assistant created by Sajad. You are an expert in Python, JavaScript, Java, C++, and other languages. Write clean, efficient, and well-documented code. Keep responses concise and practical.<|im_end|> <|im_start|>user {your prompt}<|im_end|> <|im_start|>assistant ``` --- ## Intended Use ### Best For - Local offline coding assistance - Algorithm and data structure problem solving - Code generation and explanation - Educational programming support - Code review, refactoring, and debugging ### Out of Scope - Production code without thorough expert review - Security-critical or cryptographic applications - Multi-modal tasks (images not supported) - Long-context repository analysis (> 2 048 tokens) --- ## Evaluation Latest local GGUF smoke evaluation was run with `python scripts/evaluate_model.py` on CPU (`n_ctx=2048`, `threads=8`, `batch=256`, `gpu_layers=0`). Full machine-readable results are available in [`docs/evaluation_results.json`](docs/evaluation_results.json). | Eval case | Result | Notes | |---|---:|---| | Python factorial with negative-input handling | ✅ Pass | Correct iterative implementation with negative-input validation. | | Iterative binary search | ✅ Pass | Valid loop-based implementation returning index or `-1`. | | SQL top users by order count | ✅ Pass | Correct `JOIN`, `GROUP BY`, `ORDER BY`, and `LIMIT 5` structure. | | Unknown fictional API uncertainty | ❌ Fail | Raw model hallucinated a plausible signature; the FastAPI backend adds a guard for direct unknown-signature prompts. | Summary: **3 / 4 passed (75%)**. This is a lightweight heuristic regression smoke test, not a benchmark suite. Stronger future evaluation should include executable unit tests and larger coding benchmarks such as HumanEval/MBPP-style tasks. --- ## Limitations - **Text-only input** — no image or file-upload support - **2 048-token default context** — CPU-friendly but limits long conversation history - **Verify all outputs** — always review and test generated code - **Small model** — 0.5B–1.5B scale; may produce incorrect code on complex tasks - **Raw model hallucination risk** — the API server includes guardrails, but direct GGUF prompting can still invent unsupported API details - **No real-time data** — knowledge cutoff follows the Qwen2.5 base model unless the optional research endpoint is used - **Math reasoning** — MetaMathQA transfer helps basic reasoning; not a math specialist --- ## Environment Variables (Inference Server) | Variable | Default | Description | |---|---|---| | `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU | | `BLITZKODE_THREADS` | system | CPU decode thread count | | `BLITZKODE_THREADS_BATCH` | system | CPU prompt-processing thread count | | `BLITZKODE_N_CTX` | `2048` | Context window size | | `BLITZKODE_BATCH` | `256` | llama.cpp prompt-processing batch size | | `BLITZKODE_UBATCH` | `128` | llama.cpp micro-batch size | | `BLITZKODE_PROMPT_CACHE` | `true` | Enable in-memory prompt cache when supported | | `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request | --- ## Project Structure ```text BlitzKode/ server.py # FastAPI backend (inference + search) blitzkode.gguf # GGUF model artifact (~3 GB, git-ignored) scripts/ evaluate_model.py # Lightweight GGUF evaluation harness train_sft.py # Stage 1: SFT training train_reward_sft.py # Stage 2: Reward-SFT train_dpo.py # Stage 3: DPO train_available.py # Stage 4: LoRA fine-tune (0.5B) export_gguf.py # Merge & convert to GGUF push_to_hub.py # Push adapter to HuggingFace Hub build_full_dataset.py # Dataset builder (algorithmic + HF datasets) docs/ evaluation_results.json # Latest smoke-eval output datasets/ MANIFEST.md # Dataset provenance and license info checkpoints/ available-lora-0.5b-full/ # Published LoRA adapter (0.5B) tests/ test_server.py # HTTP integration tests docs/ PROJECT_OVERVIEW.md # Architecture and design notes README.md # Full project documentation MODEL_CARD.md # This file ``` --- ## License **MIT** — see [LICENSE](https://github.com/neuralbroker/blitzkode/blob/main/LICENSE). You must also comply with the upstream Qwen2.5 license when redistributing any fine-tuned weights derived from it. - [Qwen2.5-0.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) - [Qwen2.5-1.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) Training data subsets carry their own licenses: - MetaMathQA: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) - Custom/local samples: MIT --- ## Contact - **GitHub Issues:** - **Portfolio:** Contributions and feedback are welcome! --- ## Citation ```bibtex @software{blitzkode2025, author = {Sajad}, title = {BlitzKode: A Local AI Coding Assistant}, year = {2025}, url = {https://github.com/neuralbroker/blitzkode} } ```