blitzkode / MODEL_CARD.md
neuralbroker's picture
Update MODEL_CARD.md (v2.1 production)
b93d2d5 verified
|
raw
history blame
9.45 kB
metadata
language:
  - en
license: mit
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
  - code-generation
  - coding-assistant
  - gguf
  - llama.cpp
  - qwen2.5
  - python
  - javascript
  - fine-tuned
  - lora
  - peft
base_model:
  - Qwen/Qwen2.5-1.5B-Instruct
  - Qwen/Qwen2.5-0.5B-Instruct

BlitzKode

BlitzKode is a local AI coding assistant fine-tuned from the Qwen2.5 family. It ships as a GGUF model (1.5B, F16, ~3 GB) for fast offline inference with llama.cpp, and as a LoRA adapter (0.5B, ~100 MB) for PEFT-based research and further fine-tuning.

Creator: Sajad (neuralbroker) GitHub: https://github.com/neuralbroker/blitzkode GGUF model: neuralbroker/blitzkode LoRA adapter: neuralbroker/blitzkode-lora-0.5b


Model Variants

Variant Version Base Model Format Size Runtime
GGUF (production) 2.0 Qwen/Qwen2.5-1.5B-Instruct GGUF F16 ~3 GB llama.cpp / llama-cpp-python
LoRA adapter (research) 2.1 Qwen/Qwen2.5-0.5B-Instruct PEFT safetensors ~100 MB PEFT + Transformers

Architecture

Property GGUF (1.5B) LoRA Adapter (0.5B)
Model type Transformer (Qwen2) Transformer (Qwen2) + LoRA
Parameters 1.5 B 0.5 B + adapter weights
Quantization GGUF F16 bfloat16 / float16
LoRA rank (r) β€” 16
LoRA alpha β€” 32
LoRA target modules β€” q, k, v, o, gate, up, down projections
Context window 2 048 tokens 2 048 tokens
Vocabulary 151 936 151 936

Training Pipeline

BlitzKode was produced by a 4-stage fine-tuning pipeline:

Stage 1 β€” SFT (Supervised Fine-Tuning)

LoRA fine-tuning (r=32, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic coding problems covering arrays, strings, trees, dynamic programming, graphs, sorting, hash tables, binary search, and more.

  • Adapter checkpoint: checkpoints/sft-1.5b-v1/
  • Library: PEFT + HuggingFace Transformers

Stage 2 β€” Reward-SFT

Continued SFT with heuristic reward functions to reinforce code correctness, formatting quality, and concise explanation style. This is a standard SFT training loop using scalar reward signals, not full GRPO.

  • Adapter checkpoint: checkpoints/grpo-v1/ (label is historical)
  • Library: TRL / Transformers

Stage 3 β€” DPO (Direct Preference Optimization)

Preference optimization on handcrafted chosen/rejected pairs to improve answer clarity, reduce verbosity, and penalize hallucinated APIs or filenames.

  • Adapter checkpoint: checkpoints/dpo-v1/
  • Library: TRL

Stage 4 β€” Continued LoRA SFT (Published Adapter)

Final LoRA fine-tuning (r=16, base: Qwen2.5-0.5B-Instruct) on 99 samples drawn from the 199-sample full dataset. Training ran for 50 steps; final loss reached ~0.48.

  • Adapter checkpoint: checkpoints/available-lora-0.5b-full/final βœ… (publicly available)
  • Library: PEFT + Transformers

Stage 5 β€” Merge & Export (GGUF)

LoRA adapters from Stage 1–3 were merged into the 1.5B base model using merge_and_unload(), then converted to GGUF F16 format with llama.cpp.

  • Script: scripts/export_gguf.py
  • Artifact: blitzkode.gguf (~3 GB, git-ignored)

Training Data

Total: 199 samples across 3 subsets

Subset Count Source License Purpose
Curated algorithmic problems 71 Custom (local) MIT Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching
MetaMathQA samples 100 meta-math/MetaMathQA CC BY 4.0 Math reasoning transfer to improve step-by-step problem solving
Python/JavaScript patterns 28 Custom (local) MIT Practical patterns: decorators, context managers, data classes, async, CLI tools
Total 199

See datasets/MANIFEST.md for full dataset provenance, preprocessing notes, and per-sample license details.


Features

  • Multi-language code generation β€” Python, JavaScript, Java, C++, TypeScript, SQL
  • Code explanation β€” clear inline comments and documentation
  • Bug fixing β€” debug and fix common code issues
  • Algorithm assistance β€” data structures and algorithms (LeetCode-style)
  • Offline operation β€” fully local, no internet required at inference time
  • Fast CPU inference β€” GGUF F16 runs on commodity CPUs
  • Modern web UI β€” React/Vite chat interface with SSE streaming
  • REST API β€” FastAPI backend with streaming and optional web-search augmentation

Usage

Production: GGUF with llama.cpp

# Clone and install
git clone https://github.com/neuralbroker/blitzkode
cd blitzkode
pip install -r requirements.txt

# Build the frontend
cd frontend && npm install && npm run build && cd ..

# Start the server (place blitzkode.gguf in repo root first)
python server.py
# Open http://localhost:7860

Research: LoRA Adapter with PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_repo  = "neuralbroker/blitzkode-lora-0.5b"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()

Prompt Format (ChatML)

All variants use the Qwen ChatML template:

<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert
in Python, JavaScript, Java, C++, and other languages. Write clean, efficient,
and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant

Intended Use

Best For

  • Local offline coding assistance
  • Algorithm and data structure problem solving
  • Code generation and explanation
  • Educational programming support
  • Code review, refactoring, and debugging

Out of Scope

  • Production code without thorough expert review
  • Security-critical or cryptographic applications
  • Multi-modal tasks (images not supported)
  • Long-context repository analysis (> 2 048 tokens)

Limitations

  • Text-only input β€” no image or file-upload support
  • 2 048-token context β€” CPU-friendly but limits long conversation history
  • Verify all outputs β€” always review and test generated code
  • Small model β€” 0.5B–1.5B scale; may produce incorrect code on complex tasks
  • No real-time data β€” knowledge cutoff follows the Qwen2.5 base model
  • Math reasoning β€” MetaMathQA transfer helps basic reasoning; not a math specialist

Environment Variables (Inference Server)

Variable Default Description
BLITZKODE_GPU_LAYERS 0 Number of layers to offload to GPU
BLITZKODE_THREADS system CPU inference thread count
BLITZKODE_N_CTX 2048 Context window size
BLITZKODE_BATCH 512 llama.cpp batch size
BLITZKODE_PRELOAD_MODEL false Load model at startup vs first request

Project Structure

BlitzKode/
  server.py                   # FastAPI backend (inference + search)
  blitzkode.gguf              # GGUF model artifact (~3 GB, git-ignored)
  frontend/                   # React/Vite web UI
  scripts/
    train_sft.py              # Stage 1: SFT training
    train_reward_sft.py       # Stage 2: Reward-SFT
    train_dpo.py              # Stage 3: DPO
    train_available.py        # Stage 4: LoRA fine-tune (0.5B)
    export_gguf.py            # Merge & convert to GGUF
    push_to_hub.py            # Push adapter to HuggingFace Hub
    build_full_dataset.py     # Dataset builder (algorithmic + HF datasets)
  datasets/
    MANIFEST.md               # Dataset provenance and license info
  checkpoints/
    available-lora-0.5b-full/ # Published LoRA adapter (0.5B)
  tests/
    test_server.py            # HTTP integration tests
  docs/
    PROJECT_OVERVIEW.md       # Architecture and design notes
  README.md                   # Full project documentation
  MODEL_CARD.md               # This file

License

MIT β€” see LICENSE.

You must also comply with the upstream Qwen2.5 license when redistributing any fine-tuned weights derived from it.

Training data subsets carry their own licenses:

  • MetaMathQA: CC BY 4.0
  • Custom/local samples: MIT

Contact

Contributions and feedback are welcome!


Citation

@software{blitzkode2025,
  author  = {Sajad},
  title   = {BlitzKode: A Local AI Coding Assistant},
  year    = {2025},
  url     = {https://github.com/neuralbroker/blitzkode}
}