blitzkode / MODEL_CARD.md
neuralbroker's picture
Update clean backend-only project docs and eval
25fe3e8 verified
---
language:
- en
license: mit
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
- code-generation
- coding-assistant
- gguf
- llama.cpp
- qwen2.5
- python
- javascript
- fine-tuned
- lora
- peft
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
- Qwen/Qwen2.5-0.5B-Instruct
---
# BlitzKode
**BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It
ships as a **GGUF model** (1.5B, Q8_0, ~1.53 GB) for fast offline inference with
llama.cpp, and as **LoRA adapters** for PEFT-based research and further
fine-tuning.
> **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker)
> **GitHub:** <https://github.com/neuralbroker/blitzkode>
> **GGUF model:** [`neuralbroker/blitzkode`](https://huggingface.co/neuralbroker/blitzkode)
> **LoRA adapter:** [`neuralbroker/blitzkode-lora-0.5b`](https://huggingface.co/neuralbroker/blitzkode-lora-0.5b)
---
## Model Variants
| Variant | Version | Base Model | Format | Size | Runtime |
|---|---|---|---|---|---|
| **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF Q8_0 | ~1.53 GB | llama.cpp / llama-cpp-python |
| **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers |
---
## Architecture
| Property | GGUF (1.5B) | LoRA Adapter (0.5B) |
|---|---|---|
| **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA |
| **Parameters** | 1.5 B | 0.5 B + adapter weights |
| **Quantization** | GGUF Q8_0 | bfloat16 / float16 |
| **LoRA rank (r)** | β€” | 16 |
| **LoRA alpha** | β€” | 32 |
| **LoRA target modules** | β€” | q, k, v, o, gate, up, down projections |
| **Context window** | 2 048 tokens | 2 048 tokens |
| **Vocabulary** | 151 936 | 151 936 |
---
## Training Pipeline
BlitzKode was produced by a **4-stage fine-tuning pipeline**:
### Stage 1 β€” SFT (Supervised Fine-Tuning)
LoRA fine-tuning (`r=32`, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic
coding problems covering arrays, strings, trees, dynamic programming, graphs,
sorting, hash tables, binary search, and more.
- **Adapter checkpoint:** `checkpoints/sft-1.5b-v1/`
- **Library:** PEFT + HuggingFace Transformers
### Stage 2 β€” Reward-SFT
Continued SFT with heuristic reward functions to reinforce code correctness,
formatting quality, and concise explanation style. This is a standard SFT
training loop using scalar reward signals, **not** full GRPO.
- **Adapter checkpoint:** `checkpoints/grpo-v1/` *(label is historical)*
- **Library:** TRL / Transformers
### Stage 3 β€” DPO (Direct Preference Optimization)
Preference optimization on handcrafted chosen/rejected pairs to improve answer
clarity, reduce verbosity, and penalize hallucinated APIs or filenames.
- **Adapter checkpoint:** `checkpoints/dpo-v1/`
- **Library:** TRL
### Stage 4 β€” Continued LoRA SFT (Published Adapter)
Final LoRA fine-tuning (`r=16`, base: **Qwen2.5-0.5B-Instruct**) on 99 samples
drawn from the 199-sample full dataset. Training ran for 50 steps; final loss
reached **~0.48**.
- **Adapter checkpoint:** `checkpoints/available-lora-0.5b-full/final` βœ… *(publicly available)*
- **Library:** PEFT + Transformers
### Stage 5 β€” Merge & Export (GGUF)
LoRA adapters from Stage 1–3 were merged into the 1.5B base model using
`merge_and_unload()`, then converted to GGUF Q8_0 format with llama.cpp.
- **Script:** `scripts/export_gguf.py`
- **Artifact:** `blitzkode.gguf` (~1.53 GB, git-ignored)
---
## Training Data
**Total: 199 samples across 3 subsets**
| Subset | Count | Source | License | Purpose |
|---|---|---|---|---|
| Curated algorithmic problems | 71 | Custom (local) | MIT | Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching |
| MetaMathQA samples | 100 | [`meta-math/MetaMathQA`](https://huggingface.co/datasets/meta-math/MetaMathQA) | CC BY 4.0 | Math reasoning transfer to improve step-by-step problem solving |
| Python/JavaScript patterns | 28 | Custom (local) | MIT | Practical patterns: decorators, context managers, data classes, async, CLI tools |
| **Total** | **199** | | | |
See [`datasets/MANIFEST.md`](datasets/MANIFEST.md) for full dataset provenance,
preprocessing notes, and per-sample license details.
---
## Features
- **Multi-language code generation** β€” Python, JavaScript, Java, C++, TypeScript, SQL
- **Code explanation** β€” clear inline comments and documentation
- **Bug fixing** β€” debug and fix common code issues
- **Algorithm assistance** β€” data structures and algorithms (LeetCode-style)
- **Offline operation** β€” fully local, no internet required at inference time
- **Fast CPU inference** β€” GGUF F16 runs on commodity CPUs
- **API-first serving** β€” FastAPI backend with REST and SSE streaming endpoints
- **Optimized local inference** β€” configurable llama.cpp GPU offload, mmap loading, batching, and prompt cache
---
## Usage
### Production: GGUF with llama.cpp
```bash
# Clone and install
git clone https://github.com/neuralbroker/blitzkode
cd blitzkode
pip install -r requirements.txt
# Start the API server (place blitzkode.gguf in repo root first)
python server.py
curl http://localhost:7860/health
```
### Research: LoRA Adapter with PEFT
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_repo = "neuralbroker/blitzkode-lora-0.5b"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```
### Prompt Format (ChatML)
All variants use the Qwen ChatML template:
```
<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert
in Python, JavaScript, Java, C++, and other languages. Write clean, efficient,
and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant
```
---
## Intended Use
### Best For
- Local offline coding assistance
- Algorithm and data structure problem solving
- Code generation and explanation
- Educational programming support
- Code review, refactoring, and debugging
### Out of Scope
- Production code without thorough expert review
- Security-critical or cryptographic applications
- Multi-modal tasks (images not supported)
- Long-context repository analysis (> 2 048 tokens)
---
## Evaluation
Latest local GGUF smoke evaluation was run with `python scripts/evaluate_model.py` on CPU (`n_ctx=2048`, `threads=8`, `batch=256`, `gpu_layers=0`). Full machine-readable results are available in [`docs/evaluation_results.json`](docs/evaluation_results.json).
| Eval case | Result | Notes |
|---|---:|---|
| Python factorial with negative-input handling | βœ… Pass | Correct iterative implementation with negative-input validation. |
| Iterative binary search | βœ… Pass | Valid loop-based implementation returning index or `-1`. |
| SQL top users by order count | βœ… Pass | Correct `JOIN`, `GROUP BY`, `ORDER BY`, and `LIMIT 5` structure. |
| Unknown fictional API uncertainty | ❌ Fail | Raw model hallucinated a plausible signature; the FastAPI backend adds a guard for direct unknown-signature prompts. |
Summary: **3 / 4 passed (75%)**. This is a lightweight heuristic regression smoke test, not a benchmark suite. Stronger future evaluation should include executable unit tests and larger coding benchmarks such as HumanEval/MBPP-style tasks.
---
## Limitations
- **Text-only input** β€” no image or file-upload support
- **2 048-token default context** β€” CPU-friendly but limits long conversation history
- **Verify all outputs** β€” always review and test generated code
- **Small model** β€” 0.5B–1.5B scale; may produce incorrect code on complex tasks
- **Raw model hallucination risk** β€” the API server includes guardrails, but direct GGUF prompting can still invent unsupported API details
- **No real-time data** β€” knowledge cutoff follows the Qwen2.5 base model unless the optional research endpoint is used
- **Math reasoning** β€” MetaMathQA transfer helps basic reasoning; not a math specialist
---
## Environment Variables (Inference Server)
| Variable | Default | Description |
|---|---|---|
| `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU |
| `BLITZKODE_THREADS` | system | CPU decode thread count |
| `BLITZKODE_THREADS_BATCH` | system | CPU prompt-processing thread count |
| `BLITZKODE_N_CTX` | `2048` | Context window size |
| `BLITZKODE_BATCH` | `256` | llama.cpp prompt-processing batch size |
| `BLITZKODE_UBATCH` | `128` | llama.cpp micro-batch size |
| `BLITZKODE_PROMPT_CACHE` | `true` | Enable in-memory prompt cache when supported |
| `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request |
---
## Project Structure
```text
BlitzKode/
server.py # FastAPI backend (inference + search)
blitzkode.gguf # GGUF model artifact (~3 GB, git-ignored)
scripts/
evaluate_model.py # Lightweight GGUF evaluation harness
train_sft.py # Stage 1: SFT training
train_reward_sft.py # Stage 2: Reward-SFT
train_dpo.py # Stage 3: DPO
train_available.py # Stage 4: LoRA fine-tune (0.5B)
export_gguf.py # Merge & convert to GGUF
push_to_hub.py # Push adapter to HuggingFace Hub
build_full_dataset.py # Dataset builder (algorithmic + HF datasets)
docs/
evaluation_results.json # Latest smoke-eval output
datasets/
MANIFEST.md # Dataset provenance and license info
checkpoints/
available-lora-0.5b-full/ # Published LoRA adapter (0.5B)
tests/
test_server.py # HTTP integration tests
docs/
PROJECT_OVERVIEW.md # Architecture and design notes
README.md # Full project documentation
MODEL_CARD.md # This file
```
---
## License
**MIT** β€” see [LICENSE](https://github.com/neuralbroker/blitzkode/blob/main/LICENSE).
You must also comply with the upstream Qwen2.5 license when redistributing any
fine-tuned weights derived from it.
- [Qwen2.5-0.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
- [Qwen2.5-1.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
Training data subsets carry their own licenses:
- MetaMathQA: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
- Custom/local samples: MIT
---
## Contact
- **GitHub Issues:** <https://github.com/neuralbroker/blitzkode/issues>
- **Portfolio:** <https://neuralbroker.vercel.app>
Contributions and feedback are welcome!
---
## Citation
```bibtex
@software{blitzkode2025,
author = {Sajad},
title = {BlitzKode: A Local AI Coding Assistant},
year = {2025},
url = {https://github.com/neuralbroker/blitzkode}
}
```