neuralbroker commited on
Commit
25fe3e8
Β·
verified Β·
1 Parent(s): d5a79fa

Update clean backend-only project docs and eval

Browse files
Files changed (1) hide show
  1. MODEL_CARD.md +37 -19
MODEL_CARD.md CHANGED
@@ -23,9 +23,9 @@ base_model:
23
  # BlitzKode
24
 
25
  **BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It
26
- ships as a **GGUF model** (1.5B, F16, ~3 GB) for fast offline inference with
27
- llama.cpp, and as a **LoRA adapter** (0.5B, ~100 MB) for PEFT-based research and
28
- further fine-tuning.
29
 
30
  > **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker)
31
  > **GitHub:** <https://github.com/neuralbroker/blitzkode>
@@ -38,7 +38,7 @@ further fine-tuning.
38
 
39
  | Variant | Version | Base Model | Format | Size | Runtime |
40
  |---|---|---|---|---|---|
41
- | **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF F16 | ~3 GB | llama.cpp / llama-cpp-python |
42
  | **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers |
43
 
44
  ---
@@ -49,7 +49,7 @@ further fine-tuning.
49
  |---|---|---|
50
  | **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA |
51
  | **Parameters** | 1.5 B | 0.5 B + adapter weights |
52
- | **Quantization** | GGUF F16 | bfloat16 / float16 |
53
  | **LoRA rank (r)** | β€” | 16 |
54
  | **LoRA alpha** | β€” | 32 |
55
  | **LoRA target modules** | β€” | q, k, v, o, gate, up, down projections |
@@ -95,10 +95,10 @@ reached **~0.48**.
95
 
96
  ### Stage 5 β€” Merge & Export (GGUF)
97
  LoRA adapters from Stage 1–3 were merged into the 1.5B base model using
98
- `merge_and_unload()`, then converted to GGUF F16 format with llama.cpp.
99
 
100
  - **Script:** `scripts/export_gguf.py`
101
- - **Artifact:** `blitzkode.gguf` (~3 GB, git-ignored)
102
 
103
  ---
104
 
@@ -126,8 +126,8 @@ preprocessing notes, and per-sample license details.
126
  - **Algorithm assistance** β€” data structures and algorithms (LeetCode-style)
127
  - **Offline operation** β€” fully local, no internet required at inference time
128
  - **Fast CPU inference** β€” GGUF F16 runs on commodity CPUs
129
- - **Modern web UI** β€” React/Vite chat interface with SSE streaming
130
- - **REST API** β€” FastAPI backend with streaming and optional web-search augmentation
131
 
132
  ---
133
 
@@ -141,12 +141,9 @@ git clone https://github.com/neuralbroker/blitzkode
141
  cd blitzkode
142
  pip install -r requirements.txt
143
 
144
- # Build the frontend
145
- cd frontend && npm install && npm run build && cd ..
146
-
147
- # Start the server (place blitzkode.gguf in repo root first)
148
  python server.py
149
- # Open http://localhost:7860
150
  ```
151
 
152
  ### Research: LoRA Adapter with PEFT
@@ -199,13 +196,29 @@ and well-documented code. Keep responses concise and practical.<|im_end|>
199
 
200
  ---
201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  ## Limitations
203
 
204
  - **Text-only input** β€” no image or file-upload support
205
- - **2 048-token context** β€” CPU-friendly but limits long conversation history
206
  - **Verify all outputs** β€” always review and test generated code
207
  - **Small model** β€” 0.5B–1.5B scale; may produce incorrect code on complex tasks
208
- - **No real-time data** β€” knowledge cutoff follows the Qwen2.5 base model
 
209
  - **Math reasoning** β€” MetaMathQA transfer helps basic reasoning; not a math specialist
210
 
211
  ---
@@ -215,9 +228,12 @@ and well-documented code. Keep responses concise and practical.<|im_end|>
215
  | Variable | Default | Description |
216
  |---|---|---|
217
  | `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU |
218
- | `BLITZKODE_THREADS` | system | CPU inference thread count |
 
219
  | `BLITZKODE_N_CTX` | `2048` | Context window size |
220
- | `BLITZKODE_BATCH` | `512` | llama.cpp batch size |
 
 
221
  | `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request |
222
 
223
  ---
@@ -228,8 +244,8 @@ and well-documented code. Keep responses concise and practical.<|im_end|>
228
  BlitzKode/
229
  server.py # FastAPI backend (inference + search)
230
  blitzkode.gguf # GGUF model artifact (~3 GB, git-ignored)
231
- frontend/ # React/Vite web UI
232
  scripts/
 
233
  train_sft.py # Stage 1: SFT training
234
  train_reward_sft.py # Stage 2: Reward-SFT
235
  train_dpo.py # Stage 3: DPO
@@ -237,6 +253,8 @@ BlitzKode/
237
  export_gguf.py # Merge & convert to GGUF
238
  push_to_hub.py # Push adapter to HuggingFace Hub
239
  build_full_dataset.py # Dataset builder (algorithmic + HF datasets)
 
 
240
  datasets/
241
  MANIFEST.md # Dataset provenance and license info
242
  checkpoints/
 
23
  # BlitzKode
24
 
25
  **BlitzKode** is a local AI coding assistant fine-tuned from the Qwen2.5 family. It
26
+ ships as a **GGUF model** (1.5B, Q8_0, ~1.53 GB) for fast offline inference with
27
+ llama.cpp, and as **LoRA adapters** for PEFT-based research and further
28
+ fine-tuning.
29
 
30
  > **Creator:** [Sajad (neuralbroker)](https://github.com/neuralbroker)
31
  > **GitHub:** <https://github.com/neuralbroker/blitzkode>
 
38
 
39
  | Variant | Version | Base Model | Format | Size | Runtime |
40
  |---|---|---|---|---|---|
41
+ | **GGUF** (production) | 2.0 | `Qwen/Qwen2.5-1.5B-Instruct` | GGUF Q8_0 | ~1.53 GB | llama.cpp / llama-cpp-python |
42
  | **LoRA adapter** (research) | 2.1 | `Qwen/Qwen2.5-0.5B-Instruct` | PEFT safetensors | ~100 MB | PEFT + Transformers |
43
 
44
  ---
 
49
  |---|---|---|
50
  | **Model type** | Transformer (Qwen2) | Transformer (Qwen2) + LoRA |
51
  | **Parameters** | 1.5 B | 0.5 B + adapter weights |
52
+ | **Quantization** | GGUF Q8_0 | bfloat16 / float16 |
53
  | **LoRA rank (r)** | β€” | 16 |
54
  | **LoRA alpha** | β€” | 32 |
55
  | **LoRA target modules** | β€” | q, k, v, o, gate, up, down projections |
 
95
 
96
  ### Stage 5 β€” Merge & Export (GGUF)
97
  LoRA adapters from Stage 1–3 were merged into the 1.5B base model using
98
+ `merge_and_unload()`, then converted to GGUF Q8_0 format with llama.cpp.
99
 
100
  - **Script:** `scripts/export_gguf.py`
101
+ - **Artifact:** `blitzkode.gguf` (~1.53 GB, git-ignored)
102
 
103
  ---
104
 
 
126
  - **Algorithm assistance** β€” data structures and algorithms (LeetCode-style)
127
  - **Offline operation** β€” fully local, no internet required at inference time
128
  - **Fast CPU inference** β€” GGUF F16 runs on commodity CPUs
129
+ - **API-first serving** β€” FastAPI backend with REST and SSE streaming endpoints
130
+ - **Optimized local inference** β€” configurable llama.cpp GPU offload, mmap loading, batching, and prompt cache
131
 
132
  ---
133
 
 
141
  cd blitzkode
142
  pip install -r requirements.txt
143
 
144
+ # Start the API server (place blitzkode.gguf in repo root first)
 
 
 
145
  python server.py
146
+ curl http://localhost:7860/health
147
  ```
148
 
149
  ### Research: LoRA Adapter with PEFT
 
196
 
197
  ---
198
 
199
+ ## Evaluation
200
+
201
+ Latest local GGUF smoke evaluation was run with `python scripts/evaluate_model.py` on CPU (`n_ctx=2048`, `threads=8`, `batch=256`, `gpu_layers=0`). Full machine-readable results are available in [`docs/evaluation_results.json`](docs/evaluation_results.json).
202
+
203
+ | Eval case | Result | Notes |
204
+ |---|---:|---|
205
+ | Python factorial with negative-input handling | βœ… Pass | Correct iterative implementation with negative-input validation. |
206
+ | Iterative binary search | βœ… Pass | Valid loop-based implementation returning index or `-1`. |
207
+ | SQL top users by order count | βœ… Pass | Correct `JOIN`, `GROUP BY`, `ORDER BY`, and `LIMIT 5` structure. |
208
+ | Unknown fictional API uncertainty | ❌ Fail | Raw model hallucinated a plausible signature; the FastAPI backend adds a guard for direct unknown-signature prompts. |
209
+
210
+ Summary: **3 / 4 passed (75%)**. This is a lightweight heuristic regression smoke test, not a benchmark suite. Stronger future evaluation should include executable unit tests and larger coding benchmarks such as HumanEval/MBPP-style tasks.
211
+
212
+ ---
213
+
214
  ## Limitations
215
 
216
  - **Text-only input** β€” no image or file-upload support
217
+ - **2 048-token default context** β€” CPU-friendly but limits long conversation history
218
  - **Verify all outputs** β€” always review and test generated code
219
  - **Small model** β€” 0.5B–1.5B scale; may produce incorrect code on complex tasks
220
+ - **Raw model hallucination risk** β€” the API server includes guardrails, but direct GGUF prompting can still invent unsupported API details
221
+ - **No real-time data** β€” knowledge cutoff follows the Qwen2.5 base model unless the optional research endpoint is used
222
  - **Math reasoning** β€” MetaMathQA transfer helps basic reasoning; not a math specialist
223
 
224
  ---
 
228
  | Variable | Default | Description |
229
  |---|---|---|
230
  | `BLITZKODE_GPU_LAYERS` | `0` | Number of layers to offload to GPU |
231
+ | `BLITZKODE_THREADS` | system | CPU decode thread count |
232
+ | `BLITZKODE_THREADS_BATCH` | system | CPU prompt-processing thread count |
233
  | `BLITZKODE_N_CTX` | `2048` | Context window size |
234
+ | `BLITZKODE_BATCH` | `256` | llama.cpp prompt-processing batch size |
235
+ | `BLITZKODE_UBATCH` | `128` | llama.cpp micro-batch size |
236
+ | `BLITZKODE_PROMPT_CACHE` | `true` | Enable in-memory prompt cache when supported |
237
  | `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup vs first request |
238
 
239
  ---
 
244
  BlitzKode/
245
  server.py # FastAPI backend (inference + search)
246
  blitzkode.gguf # GGUF model artifact (~3 GB, git-ignored)
 
247
  scripts/
248
+ evaluate_model.py # Lightweight GGUF evaluation harness
249
  train_sft.py # Stage 1: SFT training
250
  train_reward_sft.py # Stage 2: Reward-SFT
251
  train_dpo.py # Stage 3: DPO
 
253
  export_gguf.py # Merge & convert to GGUF
254
  push_to_hub.py # Push adapter to HuggingFace Hub
255
  build_full_dataset.py # Dataset builder (algorithmic + HF datasets)
256
+ docs/
257
+ evaluation_results.json # Latest smoke-eval output
258
  datasets/
259
  MANIFEST.md # Dataset provenance and license info
260
  checkpoints/