---
language:
  - en
license: mit
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
  - code-generation
  - coding-assistant
  - gguf
  - llama.cpp
  - qwen2.5
  - python
  - javascript
  - fine-tuned
  - lora
  - peft
base_model:
  - Qwen/Qwen2.5-1.5B-Instruct
  - Qwen/Qwen2.5-0.5B-Instruct
---

# BlitzKode

**BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It
ships as a **GGUF model** (1.5B, Q8_0, ~1.53 GB) for fast offline inference with
llama.cpp, and as **LoRA adapters** for PEFT-based research and further
fine-tuning.

> **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker)
> **GitHub:** <https://github.com/neuralbroker/blitzkode>
> **GGUF model:** [`neuralbroker/blitzkode`](https://huggingface.co/neuralbroker/blitzkode)
> **LoRA adapter:** [`neuralbroker/blitzkode-lora-0.5b`](https://huggingface.co/neuralbroker/blitzkode-lora-0.5b)

---

## Model Variants

| Variant | Version | Base Model | Format | Size | Runtime |
|---|---|---|---|---|---|
| **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF Q8_0 | ~1.53 GB | llama.cpp / llama-cpp-python |
| **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers |

---

## Architecture

| Property | GGUF (1.5B) | LoRA Adapter (0.5B) |
|---|---|---|
| **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA |
| **Parameters** | 1.5 B | 0.5 B + adapter weights |
| **Quantization** | GGUF Q8_0 | bfloat16 / float16 |
| **LoRA rank (r)** | — | 16 |
| **LoRA alpha** | — | 32 |
| **LoRA target modules** | — | q, k, v, o, gate, up, down projections |
| **Context window** | 2 048 tokens | 2 048 tokens |
| **Vocabulary** | 151 936 | 151 936 |

---

## Training Pipeline

BlitzKode was produced by a **4-stage fine-tuning pipeline**:

### Stage 1 — SFT (Supervised Fine-Tuning)
LoRA fine-tuning (`r=32`, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic
coding problems covering arrays, strings, trees, dynamic programming, graphs,
sorting, hash tables, binary search, and more.

- **Adapter checkpoint:** `checkpoints/sft-1.5b-v1/`
- **Library:** PEFT + HuggingFace Transformers

### Stage 2 — Reward-SFT
Continued SFT with heuristic reward functions to reinforce code correctness,
formatting quality, and concise explanation style. This is a standard SFT
training loop using scalar reward signals, **not** full GRPO.

- **Adapter checkpoint:** `checkpoints/grpo-v1/` *(label is historical)*
- **Library:** TRL / Transformers

### Stage 3 — DPO (Direct Preference Optimization)
Preference optimization on handcrafted chosen/rejected pairs to improve answer
clarity, reduce verbosity, and penalize hallucinated APIs or filenames.

- **Adapter checkpoint:** `checkpoints/dpo-v1/`
- **Library:** TRL

### Stage 4 — Continued LoRA SFT (Published Adapter)
Final LoRA fine-tuning (`r=16`, base: **Qwen2.5-0.5B-Instruct**) on 99 samples
drawn from the 199-sample full dataset. Training ran for 50 steps; final loss
reached **~0.48**.

- **Adapter checkpoint:** `checkpoints/available-lora-0.5b-full/final` ✅ *(publicly available)*
- **Library:** PEFT + Transformers

### Stage 5 — Merge & Export (GGUF)
LoRA adapters from Stage 1–3 were merged into the 1.5B base model using
`merge_and_unload()`, then converted to GGUF Q8_0 format with llama.cpp.

- **Script:** `scripts/export_gguf.py`
- **Artifact:** `blitzkode.gguf` (~1.53 GB, git-ignored)

---

## Training Data

**Total: 199 samples across 3 subsets**

| Subset | Count | Source | License | Purpose |
|---|---|---|---|---|
| Curated algorithmic problems | 71 | Custom (local) | MIT | Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching |
| MetaMathQA samples | 100 | [`meta-math/MetaMathQA`](https://huggingface.co/datasets/meta-math/MetaMathQA) | CC BY 4.0 | Math reasoning transfer to improve step-by-step problem solving |
| Python/JavaScript patterns | 28 | Custom (local) | MIT | Practical patterns: decorators, context managers, data classes, async, CLI tools |
| **Total** | **199** | | | |

See [`datasets/MANIFEST.md`](datasets/MANIFEST.md) for full dataset provenance,
preprocessing notes, and per-sample license details.

---

## Features

- **Multi-language code generation** — Python, JavaScript, Java, C++, TypeScript, SQL
- **Code explanation** — clear inline comments and documentation
- **Bug fixing** — debug and fix common code issues
- **Algorithm assistance** — data structures and algorithms (LeetCode-style)
- **Offline operation** — fully local, no internet required at inference time
- **Fast CPU inference** — GGUF F16 runs on commodity CPUs
- **API-first serving** — FastAPI backend with REST and SSE streaming endpoints
- **Optimized local inference** — configurable llama.cpp GPU offload, mmap loading, batching, and prompt cache

---

## Usage

### Production: GGUF with llama.cpp

```bash
# Clone and install
git clone https://github.com/neuralbroker/blitzkode
cd blitzkode
pip install -r requirements.txt

# Start the API server (place blitzkode.gguf in repo root first)
python server.py
curl http://localhost:7860/health
```

### Research: LoRA Adapter with PEFT

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_repo  = "neuralbroker/blitzkode-lora-0.5b"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```

### Prompt Format (ChatML)

All variants use the Qwen ChatML template:

```
<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert
in Python, JavaScript, Java, C++, and other languages. Write clean, efficient,
and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant
```

---

## Intended Use

### Best For
- Local offline coding assistance
- Algorithm and data structure problem solving
- Code generation and explanation
- Educational programming support
- Code review, refactoring, and debugging

### Out of Scope
- Production code without thorough expert review
- Security-critical or cryptographic applications
- Multi-modal tasks (images not supported)
- Long-context repository analysis (> 2 048 tokens)

---

## Evaluation

Latest local GGUF smoke evaluation was run with `python scripts/evaluate_model.py` on CPU (`n_ctx=2048`, `threads=8`, `batch=256`, `gpu_layers=0`). Full machine-readable results are available in [`docs/evaluation_results.json`](docs/evaluation_results.json).

| Eval case | Result | Notes |
|---|---:|---|
| Python factorial with negative-input handling | ✅ Pass | Correct iterative implementation with negative-input validation. |
| Iterative binary search | ✅ Pass | Valid loop-based implementation returning index or `-1`. |
| SQL top users by order count | ✅ Pass | Correct `JOIN`, `GROUP BY`, `ORDER BY`, and `LIMIT 5` structure. |
| Unknown fictional API uncertainty | ❌ Fail | Raw model hallucinated a plausible signature; the FastAPI backend adds a guard for direct unknown-signature prompts. |

Summary: **3 / 4 passed (75%)**. This is a lightweight heuristic regression smoke test, not a benchmark suite. Stronger future evaluation should include executable unit tests and larger coding benchmarks such as HumanEval/MBPP-style tasks.

---

## Limitations

- **Text-only input** — no image or file-upload support
- **2 048-token default context** — CPU-friendly but limits long conversation history
- **Verify all outputs** — always review and test generated code
- **Small model** — 0.5B–1.5B scale; may produce incorrect code on complex tasks
- **Raw model hallucination risk** — the API server includes guardrails, but direct GGUF prompting can still invent unsupported API details
- **No real-time data** — knowledge cutoff follows the Qwen2.5 base model unless the optional research endpoint is used
- **Math reasoning** — MetaMathQA transfer helps basic reasoning; not a math specialist

---

## Environment Variables (Inference Server)

| Variable | Default | Description |
|---|---|---|
| `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU |
| `BLITZKODE_THREADS` | system | CPU decode thread count |
| `BLITZKODE_THREADS_BATCH` | system | CPU prompt-processing thread count |
| `BLITZKODE_N_CTX` | `2048` | Context window size |
| `BLITZKODE_BATCH` | `256` | llama.cpp prompt-processing batch size |
| `BLITZKODE_UBATCH` | `128` | llama.cpp micro-batch size |
| `BLITZKODE_PROMPT_CACHE` | `true` | Enable in-memory prompt cache when supported |
| `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request |

---

## Project Structure

```text
BlitzKode/
  server.py                   # FastAPI backend (inference + search)
  blitzkode.gguf              # GGUF model artifact (~3 GB, git-ignored)
  scripts/
    evaluate_model.py         # Lightweight GGUF evaluation harness
    train_sft.py              # Stage 1: SFT training
    train_reward_sft.py       # Stage 2: Reward-SFT
    train_dpo.py              # Stage 3: DPO
    train_available.py        # Stage 4: LoRA fine-tune (0.5B)
    export_gguf.py            # Merge & convert to GGUF
    push_to_hub.py            # Push adapter to HuggingFace Hub
    build_full_dataset.py     # Dataset builder (algorithmic + HF datasets)
  docs/
    evaluation_results.json   # Latest smoke-eval output
  datasets/
    MANIFEST.md               # Dataset provenance and license info
  checkpoints/
    available-lora-0.5b-full/ # Published LoRA adapter (0.5B)
  tests/
    test_server.py            # HTTP integration tests
  docs/
    PROJECT_OVERVIEW.md       # Architecture and design notes
  README.md                   # Full project documentation
  MODEL_CARD.md               # This file
```

---

## License

**MIT** — see [LICENSE](https://github.com/neuralbroker/blitzkode/blob/main/LICENSE).

You must also comply with the upstream Qwen2.5 license when redistributing any
fine-tuned weights derived from it.

- [Qwen2.5-0.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
- [Qwen2.5-1.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)

Training data subsets carry their own licenses:
- MetaMathQA: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
- Custom/local samples: MIT

---

## Contact

- **GitHub Issues:** <https://github.com/neuralbroker/blitzkode/issues>
- **Portfolio:** <https://neuralbroker.vercel.app>

Contributions and feedback are welcome!

---

## Citation

```bibtex
@software{blitzkode2025,
  author  = {Sajad},
  title   = {BlitzKode: A Local AI Coding Assistant},
  year    = {2025},
  url     = {https://github.com/neuralbroker/blitzkode}
}
```