blitzkode / MODEL_CARD.md
neuralbroker's picture
Update MODEL_CARD.md (v2.1 production)
b93d2d5 verified
|
raw
history blame
9.45 kB
---
language:
- en
license: mit
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
- code-generation
- coding-assistant
- gguf
- llama.cpp
- qwen2.5
- python
- javascript
- fine-tuned
- lora
- peft
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
- Qwen/Qwen2.5-0.5B-Instruct
---
# BlitzKode
**BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It
ships as a **GGUF model** (1.5B, F16, ~3 GB) for fast offline inference with
llama.cpp, and as a **LoRA adapter** (0.5B, ~100 MB) for PEFT-based research and
further fine-tuning.
> **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker)
> **GitHub:** <https://github.com/neuralbroker/blitzkode>
> **GGUF model:** [`neuralbroker/blitzkode`](https://huggingface.co/neuralbroker/blitzkode)
> **LoRA adapter:** [`neuralbroker/blitzkode-lora-0.5b`](https://huggingface.co/neuralbroker/blitzkode-lora-0.5b)
---
## Model Variants
| Variant | Version | Base Model | Format | Size | Runtime |
|---|---|---|---|---|---|
| **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF F16 | ~3 GB | llama.cpp / llama-cpp-python |
| **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers |
---
## Architecture
| Property | GGUF (1.5B) | LoRA Adapter (0.5B) |
|---|---|---|
| **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA |
| **Parameters** | 1.5 B | 0.5 B + adapter weights |
| **Quantization** | GGUF F16 | bfloat16 / float16 |
| **LoRA rank (r)** | — | 16 |
| **LoRA alpha** | — | 32 |
| **LoRA target modules** | — | q, k, v, o, gate, up, down projections |
| **Context window** | 2 048 tokens | 2 048 tokens |
| **Vocabulary** | 151 936 | 151 936 |
---
## Training Pipeline
BlitzKode was produced by a **4-stage fine-tuning pipeline**:
### Stage 1 — SFT (Supervised Fine-Tuning)
LoRA fine-tuning (`r=32`, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic
coding problems covering arrays, strings, trees, dynamic programming, graphs,
sorting, hash tables, binary search, and more.
- **Adapter checkpoint:** `checkpoints/sft-1.5b-v1/`
- **Library:** PEFT + HuggingFace Transformers
### Stage 2 — Reward-SFT
Continued SFT with heuristic reward functions to reinforce code correctness,
formatting quality, and concise explanation style. This is a standard SFT
training loop using scalar reward signals, **not** full GRPO.
- **Adapter checkpoint:** `checkpoints/grpo-v1/` *(label is historical)*
- **Library:** TRL / Transformers
### Stage 3 — DPO (Direct Preference Optimization)
Preference optimization on handcrafted chosen/rejected pairs to improve answer
clarity, reduce verbosity, and penalize hallucinated APIs or filenames.
- **Adapter checkpoint:** `checkpoints/dpo-v1/`
- **Library:** TRL
### Stage 4 — Continued LoRA SFT (Published Adapter)
Final LoRA fine-tuning (`r=16`, base: **Qwen2.5-0.5B-Instruct**) on 99 samples
drawn from the 199-sample full dataset. Training ran for 50 steps; final loss
reached **~0.48**.
- **Adapter checkpoint:** `checkpoints/available-lora-0.5b-full/final` ✅ *(publicly available)*
- **Library:** PEFT + Transformers
### Stage 5 — Merge & Export (GGUF)
LoRA adapters from Stage 1–3 were merged into the 1.5B base model using
`merge_and_unload()`, then converted to GGUF F16 format with llama.cpp.
- **Script:** `scripts/export_gguf.py`
- **Artifact:** `blitzkode.gguf` (~3 GB, git-ignored)
---
## Training Data
**Total: 199 samples across 3 subsets**
| Subset | Count | Source | License | Purpose |
|---|---|---|---|---|
| Curated algorithmic problems | 71 | Custom (local) | MIT | Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching |
| MetaMathQA samples | 100 | [`meta-math/MetaMathQA`](https://huggingface.co/datasets/meta-math/MetaMathQA) | CC BY 4.0 | Math reasoning transfer to improve step-by-step problem solving |
| Python/JavaScript patterns | 28 | Custom (local) | MIT | Practical patterns: decorators, context managers, data classes, async, CLI tools |
| **Total** | **199** | | | |
See [`datasets/MANIFEST.md`](datasets/MANIFEST.md) for full dataset provenance,
preprocessing notes, and per-sample license details.
---
## Features
- **Multi-language code generation** — Python, JavaScript, Java, C++, TypeScript, SQL
- **Code explanation** — clear inline comments and documentation
- **Bug fixing** — debug and fix common code issues
- **Algorithm assistance** — data structures and algorithms (LeetCode-style)
- **Offline operation** — fully local, no internet required at inference time
- **Fast CPU inference** — GGUF F16 runs on commodity CPUs
- **Modern web UI** — React/Vite chat interface with SSE streaming
- **REST API** — FastAPI backend with streaming and optional web-search augmentation
---
## Usage
### Production: GGUF with llama.cpp
```bash
# Clone and install
git clone https://github.com/neuralbroker/blitzkode
cd blitzkode
pip install -r requirements.txt
# Build the frontend
cd frontend && npm install && npm run build && cd ..
# Start the server (place blitzkode.gguf in repo root first)
python server.py
# Open http://localhost:7860
```
### Research: LoRA Adapter with PEFT
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_repo = "neuralbroker/blitzkode-lora-0.5b"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```
### Prompt Format (ChatML)
All variants use the Qwen ChatML template:
```
<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert
in Python, JavaScript, Java, C++, and other languages. Write clean, efficient,
and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant
```
---
## Intended Use
### Best For
- Local offline coding assistance
- Algorithm and data structure problem solving
- Code generation and explanation
- Educational programming support
- Code review, refactoring, and debugging
### Out of Scope
- Production code without thorough expert review
- Security-critical or cryptographic applications
- Multi-modal tasks (images not supported)
- Long-context repository analysis (> 2 048 tokens)
---
## Limitations
- **Text-only input** — no image or file-upload support
- **2 048-token context** — CPU-friendly but limits long conversation history
- **Verify all outputs** — always review and test generated code
- **Small model** — 0.5B–1.5B scale; may produce incorrect code on complex tasks
- **No real-time data** — knowledge cutoff follows the Qwen2.5 base model
- **Math reasoning** — MetaMathQA transfer helps basic reasoning; not a math specialist
---
## Environment Variables (Inference Server)
| Variable | Default | Description |
|---|---|---|
| `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU |
| `BLITZKODE_THREADS` | system | CPU inference thread count |
| `BLITZKODE_N_CTX` | `2048` | Context window size |
| `BLITZKODE_BATCH` | `512` | llama.cpp batch size |
| `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request |
---
## Project Structure
```text
BlitzKode/
server.py # FastAPI backend (inference + search)
blitzkode.gguf # GGUF model artifact (~3 GB, git-ignored)
frontend/ # React/Vite web UI
scripts/
train_sft.py # Stage 1: SFT training
train_reward_sft.py # Stage 2: Reward-SFT
train_dpo.py # Stage 3: DPO
train_available.py # Stage 4: LoRA fine-tune (0.5B)
export_gguf.py # Merge & convert to GGUF
push_to_hub.py # Push adapter to HuggingFace Hub
build_full_dataset.py # Dataset builder (algorithmic + HF datasets)
datasets/
MANIFEST.md # Dataset provenance and license info
checkpoints/
available-lora-0.5b-full/ # Published LoRA adapter (0.5B)
tests/
test_server.py # HTTP integration tests
docs/
PROJECT_OVERVIEW.md # Architecture and design notes
README.md # Full project documentation
MODEL_CARD.md # This file
```
---
## License
**MIT** — see [LICENSE](https://github.com/neuralbroker/blitzkode/blob/main/LICENSE).
You must also comply with the upstream Qwen2.5 license when redistributing any
fine-tuned weights derived from it.
- [Qwen2.5-0.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
- [Qwen2.5-1.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
Training data subsets carry their own licenses:
- MetaMathQA: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
- Custom/local samples: MIT
---
## Contact
- **GitHub Issues:** <https://github.com/neuralbroker/blitzkode/issues>
- **Portfolio:** <https://neuralbroker.vercel.app>
Contributions and feedback are welcome!
---
## Citation
```bibtex
@software{blitzkode2025,
author = {Sajad},
title = {BlitzKode: A Local AI Coding Assistant},
year = {2025},
url = {https://github.com/neuralbroker/blitzkode}
}
```