Instructions to use neuralbroker/blitzkode with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use neuralbroker/blitzkode with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="neuralbroker/blitzkode", filename="blitzkode.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - llama-cpp-python
How to use neuralbroker/blitzkode with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="neuralbroker/blitzkode", filename="blitzkode.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use neuralbroker/blitzkode with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf neuralbroker/blitzkode # Run inference directly in the terminal: llama-cli -hf neuralbroker/blitzkode
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf neuralbroker/blitzkode # Run inference directly in the terminal: llama-cli -hf neuralbroker/blitzkode
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf neuralbroker/blitzkode # Run inference directly in the terminal: ./llama-cli -hf neuralbroker/blitzkode
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf neuralbroker/blitzkode # Run inference directly in the terminal: ./build/bin/llama-cli -hf neuralbroker/blitzkode
Use Docker
docker model run hf.co/neuralbroker/blitzkode
- LM Studio
- Jan
- vLLM
How to use neuralbroker/blitzkode with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "neuralbroker/blitzkode" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "neuralbroker/blitzkode", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/neuralbroker/blitzkode
- Ollama
How to use neuralbroker/blitzkode with Ollama:
ollama run hf.co/neuralbroker/blitzkode
- Unsloth Studio new
How to use neuralbroker/blitzkode with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for neuralbroker/blitzkode to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for neuralbroker/blitzkode to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for neuralbroker/blitzkode to start chatting
- Pi new
How to use neuralbroker/blitzkode with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf neuralbroker/blitzkode
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "neuralbroker/blitzkode" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use neuralbroker/blitzkode with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf neuralbroker/blitzkode
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default neuralbroker/blitzkode
Run Hermes
hermes
- Docker Model Runner
How to use neuralbroker/blitzkode with Docker Model Runner:
docker model run hf.co/neuralbroker/blitzkode
- Lemonade
How to use neuralbroker/blitzkode with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull neuralbroker/blitzkode
Run and chat with the model
lemonade run user.blitzkode-{{QUANT_TAG}}List all available models
lemonade list
language:
- en
license: mit
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
- code-generation
- coding-assistant
- gguf
- llama.cpp
- qwen2.5
- python
- javascript
- fine-tuned
- lora
- peft
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
- Qwen/Qwen2.5-0.5B-Instruct
BlitzKode
BlitzKode is a local AI coding assistant fine-tuned from the Qwen2.5 family. It ships as a GGUF model (1.5B, F16, ~3 GB) for fast offline inference with llama.cpp, and as a LoRA adapter (0.5B, ~100 MB) for PEFT-based research and further fine-tuning.
Creator: Sajad (neuralbroker) GitHub: https://github.com/neuralbroker/blitzkode GGUF model:
neuralbroker/blitzkodeLoRA adapter:neuralbroker/blitzkode-lora-0.5b
Model Variants
| Variant | Version | Base Model | Format | Size | Runtime |
|---|---|---|---|---|---|
| GGUF (production) | 2.0 | Qwen/Qwen2.5-1.5B-Instruct |
GGUF F16 | ~3 GB | llama.cpp / llama-cpp-python |
| LoRA adapter (research) | 2.1 | Qwen/Qwen2.5-0.5B-Instruct |
PEFT safetensors | ~100 MB | PEFT + Transformers |
Architecture
| Property | GGUF (1.5B) | LoRA Adapter (0.5B) |
|---|---|---|
| Model type | Transformer (Qwen2) | Transformer (Qwen2) + LoRA |
| Parameters | 1.5 B | 0.5 B + adapter weights |
| Quantization | GGUF F16 | bfloat16 / float16 |
| LoRA rank (r) | β | 16 |
| LoRA alpha | β | 32 |
| LoRA target modules | β | q, k, v, o, gate, up, down projections |
| Context window | 2 048 tokens | 2 048 tokens |
| Vocabulary | 151 936 | 151 936 |
Training Pipeline
BlitzKode was produced by a 4-stage fine-tuning pipeline:
Stage 1 β SFT (Supervised Fine-Tuning)
LoRA fine-tuning (r=32, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic
coding problems covering arrays, strings, trees, dynamic programming, graphs,
sorting, hash tables, binary search, and more.
- Adapter checkpoint:
checkpoints/sft-1.5b-v1/ - Library: PEFT + HuggingFace Transformers
Stage 2 β Reward-SFT
Continued SFT with heuristic reward functions to reinforce code correctness, formatting quality, and concise explanation style. This is a standard SFT training loop using scalar reward signals, not full GRPO.
- Adapter checkpoint:
checkpoints/grpo-v1/(label is historical) - Library: TRL / Transformers
Stage 3 β DPO (Direct Preference Optimization)
Preference optimization on handcrafted chosen/rejected pairs to improve answer clarity, reduce verbosity, and penalize hallucinated APIs or filenames.
- Adapter checkpoint:
checkpoints/dpo-v1/ - Library: TRL
Stage 4 β Continued LoRA SFT (Published Adapter)
Final LoRA fine-tuning (r=16, base: Qwen2.5-0.5B-Instruct) on 99 samples
drawn from the 199-sample full dataset. Training ran for 50 steps; final loss
reached ~0.48.
- Adapter checkpoint:
checkpoints/available-lora-0.5b-full/finalβ (publicly available) - Library: PEFT + Transformers
Stage 5 β Merge & Export (GGUF)
LoRA adapters from Stage 1β3 were merged into the 1.5B base model using
merge_and_unload(), then converted to GGUF F16 format with llama.cpp.
- Script:
scripts/export_gguf.py - Artifact:
blitzkode.gguf(~3 GB, git-ignored)
Training Data
Total: 199 samples across 3 subsets
| Subset | Count | Source | License | Purpose |
|---|---|---|---|---|
| Curated algorithmic problems | 71 | Custom (local) | MIT | Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching |
| MetaMathQA samples | 100 | meta-math/MetaMathQA |
CC BY 4.0 | Math reasoning transfer to improve step-by-step problem solving |
| Python/JavaScript patterns | 28 | Custom (local) | MIT | Practical patterns: decorators, context managers, data classes, async, CLI tools |
| Total | 199 |
See datasets/MANIFEST.md for full dataset provenance,
preprocessing notes, and per-sample license details.
Features
- Multi-language code generation β Python, JavaScript, Java, C++, TypeScript, SQL
- Code explanation β clear inline comments and documentation
- Bug fixing β debug and fix common code issues
- Algorithm assistance β data structures and algorithms (LeetCode-style)
- Offline operation β fully local, no internet required at inference time
- Fast CPU inference β GGUF F16 runs on commodity CPUs
- Modern web UI β React/Vite chat interface with SSE streaming
- REST API β FastAPI backend with streaming and optional web-search augmentation
Usage
Production: GGUF with llama.cpp
# Clone and install
git clone https://github.com/neuralbroker/blitzkode
cd blitzkode
pip install -r requirements.txt
# Build the frontend
cd frontend && npm install && npm run build && cd ..
# Start the server (place blitzkode.gguf in repo root first)
python server.py
# Open http://localhost:7860
Research: LoRA Adapter with PEFT
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_repo = "neuralbroker/blitzkode-lora-0.5b"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
Prompt Format (ChatML)
All variants use the Qwen ChatML template:
<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert
in Python, JavaScript, Java, C++, and other languages. Write clean, efficient,
and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant
Intended Use
Best For
- Local offline coding assistance
- Algorithm and data structure problem solving
- Code generation and explanation
- Educational programming support
- Code review, refactoring, and debugging
Out of Scope
- Production code without thorough expert review
- Security-critical or cryptographic applications
- Multi-modal tasks (images not supported)
- Long-context repository analysis (> 2 048 tokens)
Limitations
- Text-only input β no image or file-upload support
- 2 048-token context β CPU-friendly but limits long conversation history
- Verify all outputs β always review and test generated code
- Small model β 0.5Bβ1.5B scale; may produce incorrect code on complex tasks
- No real-time data β knowledge cutoff follows the Qwen2.5 base model
- Math reasoning β MetaMathQA transfer helps basic reasoning; not a math specialist
Environment Variables (Inference Server)
| Variable | Default | Description |
|---|---|---|
BLITZKODE_GPU_LAYERS |
0 |
Number of layers to offload to GPU |
BLITZKODE_THREADS |
system | CPU inference thread count |
BLITZKODE_N_CTX |
2048 |
Context window size |
BLITZKODE_BATCH |
512 |
llama.cpp batch size |
BLITZKODE_PRELOAD_MODEL |
false |
Load model at startup vs first request |
Project Structure
BlitzKode/
server.py # FastAPI backend (inference + search)
blitzkode.gguf # GGUF model artifact (~3 GB, git-ignored)
frontend/ # React/Vite web UI
scripts/
train_sft.py # Stage 1: SFT training
train_reward_sft.py # Stage 2: Reward-SFT
train_dpo.py # Stage 3: DPO
train_available.py # Stage 4: LoRA fine-tune (0.5B)
export_gguf.py # Merge & convert to GGUF
push_to_hub.py # Push adapter to HuggingFace Hub
build_full_dataset.py # Dataset builder (algorithmic + HF datasets)
datasets/
MANIFEST.md # Dataset provenance and license info
checkpoints/
available-lora-0.5b-full/ # Published LoRA adapter (0.5B)
tests/
test_server.py # HTTP integration tests
docs/
PROJECT_OVERVIEW.md # Architecture and design notes
README.md # Full project documentation
MODEL_CARD.md # This file
License
MIT β see LICENSE.
You must also comply with the upstream Qwen2.5 license when redistributing any fine-tuned weights derived from it.
Training data subsets carry their own licenses:
- MetaMathQA: CC BY 4.0
- Custom/local samples: MIT
Contact
- GitHub Issues: https://github.com/neuralbroker/blitzkode/issues
- Portfolio: https://neuralbroker.vercel.app
Contributions and feedback are welcome!
Citation
@software{blitzkode2025,
author = {Sajad},
title = {BlitzKode: A Local AI Coding Assistant},
year = {2025},
url = {https://github.com/neuralbroker/blitzkode}
}