File size: 9,450 Bytes
e89e956
 
b93d2d5
 
e89e956
 
 
b93d2d5
 
 
 
 
 
 
 
 
 
e89e956
b93d2d5
 
e89e956
 
 
 
b93d2d5
 
 
 
e89e956
b93d2d5
 
 
 
e89e956
 
 
b93d2d5
e89e956
b93d2d5
 
 
 
e89e956
 
 
 
 
b93d2d5
 
 
 
 
 
 
 
 
 
e89e956
 
 
 
 
b93d2d5
e89e956
b93d2d5
 
 
 
e89e956
b93d2d5
 
e89e956
b93d2d5
 
 
 
e89e956
b93d2d5
 
e89e956
b93d2d5
 
 
e89e956
b93d2d5
 
e89e956
b93d2d5
 
 
 
e89e956
b93d2d5
 
e89e956
b93d2d5
 
 
e89e956
b93d2d5
 
e89e956
 
 
b93d2d5
e89e956
b93d2d5
e89e956
b93d2d5
 
 
 
 
 
e89e956
b93d2d5
 
e89e956
b93d2d5
e89e956
b93d2d5
e89e956
b93d2d5
 
 
 
 
 
 
 
e89e956
b93d2d5
e89e956
b93d2d5
e89e956
b93d2d5
e89e956
b93d2d5
 
 
 
 
e89e956
b93d2d5
 
e89e956
b93d2d5
 
 
e89e956
 
b93d2d5
e89e956
 
b93d2d5
 
e89e956
b93d2d5
 
e89e956
b93d2d5
 
 
 
 
 
e89e956
 
b93d2d5
e89e956
b93d2d5
e89e956
 
 
b93d2d5
 
 
e89e956
 
 
 
 
 
 
b93d2d5
e89e956
b93d2d5
 
 
 
 
 
e89e956
b93d2d5
 
 
 
 
e89e956
 
 
 
 
b93d2d5
 
 
 
 
 
e89e956
 
 
b93d2d5
e89e956
b93d2d5
 
 
 
 
 
 
e89e956
 
 
b93d2d5
e89e956
b93d2d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e89e956
 
 
 
 
b93d2d5
e89e956
b93d2d5
 
 
 
 
 
 
 
 
e89e956
 
 
 
 
b93d2d5
 
 
 
e89e956
 
 
 
 
 
b93d2d5
 
 
 
 
e89e956
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
---
language:
  - en
license: mit
library_name: llama-cpp-python
pipeline_tag: text-generation
tags:
  - code-generation
  - coding-assistant
  - gguf
  - llama.cpp
  - qwen2.5
  - python
  - javascript
  - fine-tuned
  - lora
  - peft
base_model:
  - Qwen/Qwen2.5-1.5B-Instruct
  - Qwen/Qwen2.5-0.5B-Instruct
---

# BlitzKode

**BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It
ships as a **GGUF model** (1.5B, F16, ~3 GB) for fast offline inference with
llama.cpp, and as a **LoRA adapter** (0.5B, ~100 MB) for PEFT-based research and
further fine-tuning.

> **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker)
> **GitHub:** <https://github.com/neuralbroker/blitzkode>
> **GGUF model:** [`neuralbroker/blitzkode`](https://huggingface.co/neuralbroker/blitzkode)
> **LoRA adapter:** [`neuralbroker/blitzkode-lora-0.5b`](https://huggingface.co/neuralbroker/blitzkode-lora-0.5b)

---

## Model Variants

| Variant | Version | Base Model | Format | Size | Runtime |
|---|---|---|---|---|---|
| **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF F16 | ~3 GB | llama.cpp / llama-cpp-python |
| **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers |

---

## Architecture

| Property | GGUF (1.5B) | LoRA Adapter (0.5B) |
|---|---|---|
| **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA |
| **Parameters** | 1.5 B | 0.5 B + adapter weights |
| **Quantization** | GGUF F16 | bfloat16 / float16 |
| **LoRA rank (r)** | β€” | 16 |
| **LoRA alpha** | β€” | 32 |
| **LoRA target modules** | β€” | q, k, v, o, gate, up, down projections |
| **Context window** | 2 048 tokens | 2 048 tokens |
| **Vocabulary** | 151 936 | 151 936 |

---

## Training Pipeline

BlitzKode was produced by a **4-stage fine-tuning pipeline**:

### Stage 1 β€” SFT (Supervised Fine-Tuning)
LoRA fine-tuning (`r=32`, base: Qwen2.5-1.5B-Instruct) on 71 curated algorithmic
coding problems covering arrays, strings, trees, dynamic programming, graphs,
sorting, hash tables, binary search, and more.

- **Adapter checkpoint:** `checkpoints/sft-1.5b-v1/`
- **Library:** PEFT + HuggingFace Transformers

### Stage 2 β€” Reward-SFT
Continued SFT with heuristic reward functions to reinforce code correctness,
formatting quality, and concise explanation style. This is a standard SFT
training loop using scalar reward signals, **not** full GRPO.

- **Adapter checkpoint:** `checkpoints/grpo-v1/` *(label is historical)*
- **Library:** TRL / Transformers

### Stage 3 β€” DPO (Direct Preference Optimization)
Preference optimization on handcrafted chosen/rejected pairs to improve answer
clarity, reduce verbosity, and penalize hallucinated APIs or filenames.

- **Adapter checkpoint:** `checkpoints/dpo-v1/`
- **Library:** TRL

### Stage 4 β€” Continued LoRA SFT (Published Adapter)
Final LoRA fine-tuning (`r=16`, base: **Qwen2.5-0.5B-Instruct**) on 99 samples
drawn from the 199-sample full dataset. Training ran for 50 steps; final loss
reached **~0.48**.

- **Adapter checkpoint:** `checkpoints/available-lora-0.5b-full/final` βœ… *(publicly available)*
- **Library:** PEFT + Transformers

### Stage 5 β€” Merge & Export (GGUF)
LoRA adapters from Stage 1–3 were merged into the 1.5B base model using
`merge_and_unload()`, then converted to GGUF F16 format with llama.cpp.

- **Script:** `scripts/export_gguf.py`
- **Artifact:** `blitzkode.gguf` (~3 GB, git-ignored)

---

## Training Data

**Total: 199 samples across 3 subsets**

| Subset | Count | Source | License | Purpose |
|---|---|---|---|---|
| Curated algorithmic problems | 71 | Custom (local) | MIT | Core coding skills: arrays, strings, trees, DP, graphs, sorting, searching |
| MetaMathQA samples | 100 | [`meta-math/MetaMathQA`](https://huggingface.co/datasets/meta-math/MetaMathQA) | CC BY 4.0 | Math reasoning transfer to improve step-by-step problem solving |
| Python/JavaScript patterns | 28 | Custom (local) | MIT | Practical patterns: decorators, context managers, data classes, async, CLI tools |
| **Total** | **199** | | | |

See [`datasets/MANIFEST.md`](datasets/MANIFEST.md) for full dataset provenance,
preprocessing notes, and per-sample license details.

---

## Features

- **Multi-language code generation** β€” Python, JavaScript, Java, C++, TypeScript, SQL
- **Code explanation** β€” clear inline comments and documentation
- **Bug fixing** β€” debug and fix common code issues
- **Algorithm assistance** β€” data structures and algorithms (LeetCode-style)
- **Offline operation** β€” fully local, no internet required at inference time
- **Fast CPU inference** β€” GGUF F16 runs on commodity CPUs
- **Modern web UI** β€” React/Vite chat interface with SSE streaming
- **REST API** β€” FastAPI backend with streaming and optional web-search augmentation

---

## Usage

### Production: GGUF with llama.cpp

```bash
# Clone and install
git clone https://github.com/neuralbroker/blitzkode
cd blitzkode
pip install -r requirements.txt

# Build the frontend
cd frontend && npm install && npm run build && cd ..

# Start the server (place blitzkode.gguf in repo root first)
python server.py
# Open http://localhost:7860
```

### Research: LoRA Adapter with PEFT

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_repo  = "neuralbroker/blitzkode-lora-0.5b"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
```

### Prompt Format (ChatML)

All variants use the Qwen ChatML template:

```
<|im_start|>system
You are BlitzKode, an AI coding assistant created by Sajad. You are an expert
in Python, JavaScript, Java, C++, and other languages. Write clean, efficient,
and well-documented code. Keep responses concise and practical.<|im_end|>
<|im_start|>user
{your prompt}<|im_end|>
<|im_start|>assistant
```

---

## Intended Use

### Best For
- Local offline coding assistance
- Algorithm and data structure problem solving
- Code generation and explanation
- Educational programming support
- Code review, refactoring, and debugging

### Out of Scope
- Production code without thorough expert review
- Security-critical or cryptographic applications
- Multi-modal tasks (images not supported)
- Long-context repository analysis (> 2 048 tokens)

---

## Limitations

- **Text-only input** β€” no image or file-upload support
- **2 048-token context** β€” CPU-friendly but limits long conversation history
- **Verify all outputs** β€” always review and test generated code
- **Small model** β€” 0.5B–1.5B scale; may produce incorrect code on complex tasks
- **No real-time data** β€” knowledge cutoff follows the Qwen2.5 base model
- **Math reasoning** β€” MetaMathQA transfer helps basic reasoning; not a math specialist

---

## Environment Variables (Inference Server)

| Variable | Default | Description |
|---|---|---|
| `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU |
| `BLITZKODE_THREADS` | system | CPU inference thread count |
| `BLITZKODE_N_CTX` | `2048` | Context window size |
| `BLITZKODE_BATCH` | `512` | llama.cpp batch size |
| `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request |

---

## Project Structure

```text
BlitzKode/
  server.py                   # FastAPI backend (inference + search)
  blitzkode.gguf              # GGUF model artifact (~3 GB, git-ignored)
  frontend/                   # React/Vite web UI
  scripts/
    train_sft.py              # Stage 1: SFT training
    train_reward_sft.py       # Stage 2: Reward-SFT
    train_dpo.py              # Stage 3: DPO
    train_available.py        # Stage 4: LoRA fine-tune (0.5B)
    export_gguf.py            # Merge & convert to GGUF
    push_to_hub.py            # Push adapter to HuggingFace Hub
    build_full_dataset.py     # Dataset builder (algorithmic + HF datasets)
  datasets/
    MANIFEST.md               # Dataset provenance and license info
  checkpoints/
    available-lora-0.5b-full/ # Published LoRA adapter (0.5B)
  tests/
    test_server.py            # HTTP integration tests
  docs/
    PROJECT_OVERVIEW.md       # Architecture and design notes
  README.md                   # Full project documentation
  MODEL_CARD.md               # This file
```

---

## License

**MIT** β€” see [LICENSE](https://github.com/neuralbroker/blitzkode/blob/main/LICENSE).

You must also comply with the upstream Qwen2.5 license when redistributing any
fine-tuned weights derived from it.

- [Qwen2.5-0.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
- [Qwen2.5-1.5B-Instruct license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)

Training data subsets carry their own licenses:
- MetaMathQA: [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
- Custom/local samples: MIT

---

## Contact

- **GitHub Issues:** <https://github.com/neuralbroker/blitzkode/issues>
- **Portfolio:** <https://neuralbroker.vercel.app>

Contributions and feedback are welcome!

---

## Citation

```bibtex
@software{blitzkode2025,
  author  = {Sajad},
  title   = {BlitzKode: A Local AI Coding Assistant},
  year    = {2025},
  url     = {https://github.com/neuralbroker/blitzkode}
}
```