neuralbroker commited on
Commit
2414aad
·
verified ·
1 Parent(s): 4f3439e

Update README.md (v2.1 production)

Browse files
Files changed (1) hide show
  1. README.md +196 -125
README.md CHANGED
@@ -1,185 +1,256 @@
1
  ---
2
  language:
3
- - en
4
  license: mit
5
  library_name: llama-cpp-python
6
  pipeline_tag: text-generation
7
  tags:
8
- - code-generation
9
- - coding-assistant
10
- - gguf
11
- - llama.cpp
12
- - qwen2.5
13
- - python
14
- - javascript
15
- - fine-tuned
16
  base_model:
17
- - Qwen/Qwen2.5-1.5B-Instruct
18
  ---
19
 
20
  # BlitzKode
21
 
22
- BlitzKode is a local AI coding assistant that runs entirely on your machine. It generates code in Python, JavaScript, Java, C++, and other languages through a web interface or API. The model is fine-tuned from Qwen2.5-1.5B and quantized to GGUF format for fast inference.
23
 
24
  ## Tech Stack
25
 
26
- - Model: Qwen2.5-1.5B (fine-tuned, GGUF format)
27
- - Backend: Python, FastAPI, uvicorn
28
- - Inference: llama.cpp / llama-cpp-python
29
- - Frontend: Vanilla HTML, CSS, JavaScript
30
- - Training: HuggingFace Transformers, PEFT, TRL
 
 
 
31
 
32
  ## Features
33
 
34
- - Local code generation without external API calls
35
- - Real-time streaming responses (token-by-token)
36
- - Web UI with dark theme, conversation history, copy-to-clipboard
37
- - REST API with streaming (SSE) support
38
- - Multi-language support: Python, JavaScript, Java, C++, TypeScript, SQL
39
- - Conversation context across multiple turns
40
- - Configurable via environment variables
41
- - Optional API key authentication
42
- - CPU and GPU inference support
43
- - Docker support
44
 
45
  ## Prerequisites
46
 
47
- - Python 3.9+
48
- - GGUF model file (`blitzkode.gguf`)
49
- - 4GB+ RAM recommended
 
50
 
51
- ## Installation
52
 
53
  ```bash
54
- # Clone the repository
55
- git clone https://github.com/neuralbroker/blitzkode.git
56
- cd blitzkode
57
-
58
- # Install dependencies
59
  pip install -r requirements.txt
60
-
61
- # Ensure model file exists
62
- # Place blitzkode.gguf in the project root, or set BLITZKODE_MODEL_PATH
63
  ```
64
 
65
- ## Usage
66
 
67
- Start the server:
 
 
 
 
 
 
68
 
69
  ```bash
 
70
  python server.py
71
  ```
72
 
73
- Open `http://localhost:7860` in your browser.
74
 
75
- ### Docker
76
 
77
  ```bash
 
78
  docker build -t blitzkode .
79
  docker run -p 7860:7860 -v ./blitzkode.gguf:/app/blitzkode.gguf blitzkode
 
 
 
80
  ```
81
 
82
- ### API Examples
83
 
84
  ```bash
85
- # Generate code
86
- curl -X POST http://localhost:7860/generate -H 'Content-Type: application/json' -d '{
87
- \"prompt\": \"Write a Python function to reverse a string\"
88
- }'
89
-
90
- # Stream tokens
91
- curl -X POST http://localhost:7860/generate/stream -H 'Content-Type: application/json' -d '{
92
- \"prompt\": \"Binary search implementation in Python\"
93
- }'
94
-
95
- # With conversation history
96
- curl -X POST http://localhost:7860/generate -H 'Content-Type: application/json' -d '{
97
- \"prompt\": \"Add error handling to that function\",
98
- \"messages\": [
99
- {\"role\": \"user\", \"content\": \"Write a Python function to reverse a string\"},
100
- {\"role\": \"assistant\", \"content\": \"def reverse_string(s): return s[::-1]\"}
101
- ]
102
- }'
103
-
104
- # Check server health
 
105
  curl http://localhost:7860/health
106
-
107
- # Get API info
108
  curl http://localhost:7860/info
109
  ```
110
 
111
- ### API Parameters
 
 
112
 
113
  | Parameter | Type | Default | Description |
114
- |-----------|------|---------|-------------|
115
- | prompt | string | required | Your question or request |
116
- | messages | array | [] | Conversation history (last 8 messages) |
117
- | temperature | float | 0.5 | Response randomness (0.0-2.0) |
118
- | max_tokens | int | 256 | Maximum tokens to generate |
119
- | top_p | float | 0.95 | Nucleus sampling threshold |
120
- | top_k | int | 20 | Top-k sampling |
121
- | repeat_penalty | float | 1.05 | Repetition penalty |
122
 
123
- ## Project Structure
124
 
125
- ```
126
- blitzkode/
127
- ├── server.py # FastAPI backend, main entry point
128
- ├── blitzkode.gguf # Quantized model file (~3GB)
129
- ├── Dockerfile # Docker container
130
- ├── requirements.txt # Serving dependencies
131
- ├── requirements-training.txt # Training dependencies
132
- ├── LICENSE # MIT License
133
- ├── .env.example # Environment variable template
134
- ├── frontend/
135
- │ └── index.html # Web UI (HTML/CSS/JS)
136
- ├── tests/
137
- │ └── test_server.py # HTTP endpoint tests
138
- ├── scripts/
139
- │ ├── train_sft.py # Supervised fine-tuning (LoRA)
140
- │ ├── train_grpo.py # Reward-based SFT continuation
141
- │ ├── train_dpo.py # Direct Preference Optimization
142
- │ ├── export_gguf.py # Merge checkpoints and export GGUF
143
- │ └── test_inference.py # Direct model inference test
144
- ├── checkpoints/ # Trained LoRA adapter checkpoints
145
- ├── exported/ # Merged model for GGUF export
146
- ├── datasets/
147
- │ └── raw/ # Training datasets
148
- ├── models/ # Base model files
149
- ├── .github/workflows/
150
- │ └── ci.yml # GitHub Actions CI
151
- ├── MODEL_CARD.md # Model documentation
152
- └── README.md # This file
153
- ```
154
 
155
  ## Environment Variables
156
 
157
- | Variable | Default | Description | Example |
158
- |----------|---------|-------------|---------|
159
- | BLITZKODE_PORT | 7860 | Server port | 8080 |
160
- | BLITZKODE_HOST | 0.0.0.0 | Server bind address | 127.0.0.1 |
161
- | BLITZKODE_GPU_LAYERS | 0 | GPU layers (0=CPU only) | 35 |
162
- | BLITZKODE_N_CTX | 2048 | Context window size | 4096 |
163
- | BLITZKODE_THREADS | auto | CPU threads for inference | 8 |
164
- | BLITZKODE_BATCH | 128 | Batch size for processing | 256 |
165
- | BLITZKODE_WORKERS | 2 | Concurrent request workers | 4 |
166
- | BLITZKODE_MODEL_PATH | blitzkode.gguf | Path to model file | /path/to/model.gguf |
167
- | BLITZKODE_FRONTEND_PATH | frontend/index.html | Path to frontend file | ./ui.html |
168
- | BLITZKODE_MAX_PROMPT_LENGTH | 4000 | Max prompt characters | 8000 |
169
- | BLITZKODE_PRELOAD_MODEL | false | Load model on startup | true |
170
- | BLITZKODE_CORS_ORIGINS | * | CORS origins (comma-separated) | http://localhost:3000 |
171
- | BLITZKODE_API_KEY | empty | API key (empty=disabled) | my-secret-key |
172
-
173
- ## Tests
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
 
175
  ```bash
176
- python -m unittest discover -s tests -v
 
 
 
 
 
 
 
 
 
 
 
 
 
177
  ```
178
 
179
- ## Contributing
180
 
181
- Contributions are welcome. Open an issue first for major changes.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
  ## License
184
 
185
- MIT License. See LICENSE file for details.
 
 
 
 
 
1
  ---
2
  language:
3
+ - en
4
  license: mit
5
  library_name: llama-cpp-python
6
  pipeline_tag: text-generation
7
  tags:
8
+ - code-generation
9
+ - coding-assistant
10
+ - gguf
11
+ - llama.cpp
12
+ - qwen2.5
13
+ - python
14
+ - javascript
15
+ - fine-tuned
16
  base_model:
17
+ - Qwen/Qwen2.5-1.5B-Instruct
18
  ---
19
 
20
  # BlitzKode
21
 
22
+ BlitzKode is a local AI coding assistant powered by a fine-tuned Qwen2.5-1.5B-Instruct model. It runs entirely on your machine — no external API calls, no data leaving your device.
23
 
24
  ## Tech Stack
25
 
26
+ | Layer | Tech |
27
+ |---|---|
28
+ | Base model | Qwen2.5-1.5B-Instruct |
29
+ | Fine-tuning | LoRA (r=16, α=32) via PEFT |
30
+ | Training | HuggingFace Transformers + TRL |
31
+ | Inference | llama-cpp-python (GGUF Q8_0) |
32
+ | Backend | Python 3.11+, FastAPI, uvicorn |
33
+ | Frontend | React 18, Vite, Phosphor Icons |
34
 
35
  ## Features
36
 
37
+ - **Local-first** — inference with the bundled GGUF, no cloud dependency
38
+ - **Real-time streaming** — SSE token-by-token via `/generate/stream`
39
+ - **Web research mode** — DuckDuckGo search → context-augmented generation via `/generate/research`
40
+ - **Web search API** — standalone `/search/web` endpoint for raw results
41
+ - **React chat UI** — streaming, conversation history, copy controls, research-mode toggle
42
+ - **Multi-language** — Python, JavaScript, Java, C++, TypeScript, SQL
43
+ - **API key auth + rate limiting** — production-ready security middleware
44
+ - **Docker** — multi-stage production image with frontend baked in
 
 
45
 
46
  ## Prerequisites
47
 
48
+ - Python 3.11+
49
+ - Node.js 20+ (for frontend dev/builds only)
50
+ - `blitzkode.gguf` at repo root (or set `BLITZKODE_MODEL_PATH`)
51
+ - 4 GB+ RAM
52
 
53
+ ## Quick Start
54
 
55
  ```bash
 
 
 
 
 
56
  pip install -r requirements.txt
57
+ python server.py
58
+ # Open http://localhost:7860
 
59
  ```
60
 
61
+ ## Frontend Development
62
 
63
+ ```bash
64
+ cd frontend
65
+ npm install
66
+ npm run dev # http://localhost:5173 — proxies /generate and /health to :7860
67
+ ```
68
+
69
+ ## Production Frontend Build
70
 
71
  ```bash
72
+ cd frontend && npm install && npm run build && cd ..
73
  python server.py
74
  ```
75
 
76
+ FastAPI serves `frontend/dist/index.html` and `/assets/*` from the same port.
77
 
78
+ ## Docker
79
 
80
  ```bash
81
+ # CPU
82
  docker build -t blitzkode .
83
  docker run -p 7860:7860 -v ./blitzkode.gguf:/app/blitzkode.gguf blitzkode
84
+
85
+ # GPU (with nvidia-docker)
86
+ docker compose --profile gpu up
87
  ```
88
 
89
+ ## API Examples
90
 
91
  ```bash
92
+ # Standard generation (streaming)
93
+ curl -X POST http://localhost:7860/generate/stream \
94
+ -H "Content-Type: application/json" \
95
+ -d '{"prompt":"Write a Python function to reverse a linked list"}'
96
+
97
+ # Non-streaming
98
+ curl -X POST http://localhost:7860/generate \
99
+ -H "Content-Type: application/json" \
100
+ -d '{"prompt":"Binary search in Python","max_tokens":128}'
101
+
102
+ # Web search only
103
+ curl -X POST http://localhost:7860/search/web \
104
+ -H "Content-Type: application/json" \
105
+ -d '{"query":"FastAPI dependency injection","max_results":3}'
106
+
107
+ # Research-augmented generation (search → inject → answer)
108
+ curl -X POST http://localhost:7860/generate/research \
109
+ -H "Content-Type: application/json" \
110
+ -d '{"prompt":"How do I use async generators in Python 3.12?","deep_search":true}'
111
+
112
+ # Health / info
113
  curl http://localhost:7860/health
 
 
114
  curl http://localhost:7860/info
115
  ```
116
 
117
+ ## API Parameters
118
+
119
+ ### Generation (`/generate`, `/generate/stream`)
120
 
121
  | Parameter | Type | Default | Description |
122
+ |---|---|---|---|
123
+ | `prompt` | string | required | User request |
124
+ | `messages` | array | `[]` | Conversation history (max 20) |
125
+ | `temperature` | float | `0.5` | Sampling randomness `0.0–2.0` |
126
+ | `max_tokens` | int | `256` | Max generated tokens (cap 512) |
127
+ | `top_p` | float | `0.95` | Nucleus sampling threshold |
128
+ | `top_k` | int | `20` | Top-k sampling |
129
+ | `repeat_penalty` | float | `1.05` | Repetition penalty |
130
 
131
+ ### Research (`/generate/research`)
132
 
133
+ Same as above, plus:
134
+
135
+ | Parameter | Type | Default | Description |
136
+ |---|---|---|---|
137
+ | `search_query` | string | prompt | Override query for web search |
138
+ | `search_results` | int | `5` | Results to inject |
139
+ | `deep_search` | bool | `false` | Also search documentation/best-practices variants |
140
+
141
+ ### Web search (`/search/web`)
142
+
143
+ | Parameter | Type | Default | Description |
144
+ |---|---|---|---|
145
+ | `query` | string | required | Search query |
146
+ | `max_results` | int | `5` | Results to return |
147
+ | `deep` | bool | `false` | Multi-variant deep search |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
 
149
  ## Environment Variables
150
 
151
+ | Variable | Default | Description |
152
+ |---|---|---|
153
+ | `BLITZKODE_MODEL_PATH` | `blitzkode.gguf` | GGUF model path |
154
+ | `BLITZKODE_FRONTEND_PATH` | `frontend/dist/index.html` | Built frontend |
155
+ | `BLITZKODE_HOST` | `0.0.0.0` | Server bind address |
156
+ | `BLITZKODE_PORT` | `7860` | Server port |
157
+ | `BLITZKODE_GPU_LAYERS` | `0` | GPU layers for llama.cpp |
158
+ | `BLITZKODE_N_CTX` | `2048` | Context window |
159
+ | `BLITZKODE_THREADS` | auto | CPU worker threads |
160
+ | `BLITZKODE_BATCH` | `128` | Batch size |
161
+ | `BLITZKODE_MAX_PROMPT_LENGTH` | `4000` | Max prompt chars |
162
+ | `BLITZKODE_PRELOAD_MODEL` | `false` | Load model at startup |
163
+ | `BLITZKODE_CORS_ORIGINS` | `http://localhost:7860` | CORS origins |
164
+ | `BLITZKODE_API_KEY` | empty | Optional bearer token |
165
+ | `BLITZKODE_WEB_SEARCH` | `true` | Enable web search endpoints |
166
+ | `BLITZKODE_SEARCH_TIMEOUT` | `8` | Search HTTP timeout (s) |
167
+ | `BLITZKODE_MAX_SEARCH_RESULTS` | `5` | Max search results |
168
+ | `BLITZKODE_RATE_LIMIT` | `true` | Enable per-IP rate limiting |
169
+ | `BLITZKODE_RATE_LIMIT_MAX` | `30` | Requests per IP per minute |
170
+ | `BLITZKODE_MAX_REQUEST_BYTES` | `50000` | Request body size limit |
171
+
172
+ ## Training Pipeline
173
+
174
+ BlitzKode was fine-tuned through a staged pipeline on an RTX 4060 (8 GB VRAM):
175
+
176
+ | Stage | Script | Details |
177
+ |---|---|---|
178
+ | SFT v1 | `train_sft.py` | LoRA r=32 on 24 curated coding examples |
179
+ | Reward-SFT | `train_reward_sft.py` | Reward-heuristic continuation |
180
+ | DPO | `train_dpo.py` | 10 chosen/rejected preference pairs |
181
+ | SFT v2 | `train_available.py` | LoRA r=16, 100 steps, 99 samples (1.5B) |
182
+ | Export | `export_production.py` | Merge → GGUF Q8_0 via llama.cpp |
183
+
184
+ ### Re-train from scratch
185
 
186
  ```bash
187
+ pip install -r requirements-training.txt
188
+
189
+ # Build dataset
190
+ python scripts/build_full_dataset.py
191
+
192
+ # Train 1.5B LoRA (100 steps, ~5 min on RTX 4060)
193
+ python scripts/train_available.py \
194
+ --model Qwen/Qwen2.5-1.5B-Instruct \
195
+ --quantization none \
196
+ --dataset datasets/raw/blitzkode_full_training.json \
197
+ --max-steps 100 --seq-len 384 --batch-size 1 --grad-accum 8
198
+
199
+ # Export: merge + GGUF
200
+ python scripts/export_production.py
201
  ```
202
 
203
+ ### Push to HuggingFace
204
 
205
+ ```bash
206
+ export HF_TOKEN=hf_XXXX # get from https://huggingface.co/settings/tokens
207
+ python scripts/push_all_to_hub.py
208
+ ```
209
+
210
+ This uploads:
211
+ - `checkpoints/blitzkode-1.5b-lora/final` → `neuralbroker/blitzkode-1.5b-lora`
212
+ - `checkpoints/available-lora-0.5b-full/final` → `neuralbroker/blitzkode-lora-0.5b`
213
+ - `blitzkode.gguf` → `neuralbroker/blitzkode`
214
+
215
+ ## Project Structure
216
+
217
+ ```text
218
+ BlitzKode/
219
+ server.py FastAPI backend
220
+ blitzkode.gguf Local GGUF model (ignored by git)
221
+ frontend/ React/Vite web UI
222
+ src/App.jsx Chat UI with streaming + research toggle
223
+ src/index.css
224
+ vite.config.js
225
+ scripts/
226
+ train_available.py Resource-aware LoRA training
227
+ build_full_dataset.py Dataset builder
228
+ export_production.py Merge LoRA → GGUF
229
+ push_to_hub.py Single-adapter HF push
230
+ push_all_to_hub.py Push all artifacts in one command
231
+ test_inference.py Adapter smoke test
232
+ healthcheck.sh Docker/Compose health probe
233
+ tests/test_server.py 20 backend endpoint tests (all passing)
234
+ datasets/MANIFEST.md Dataset provenance
235
+ docs/PROJECT_OVERVIEW.md Architecture & roadmap
236
+ Dockerfile Multi-stage production image
237
+ docker-compose.yml CPU + GPU service definitions
238
+ requirements.txt Serving dependencies
239
+ requirements-training.txt Training dependencies (pinned)
240
+ ```
241
+
242
+ ## CI
243
+
244
+ ```bash
245
+ python -m pytest tests/ -v # 20 tests, all pass
246
+ python -m ruff check . # lint
247
+ npm --prefix frontend run build # frontend build
248
+ ```
249
 
250
  ## License
251
 
252
+ MIT. See `LICENSE`. Also comply with [Qwen2.5 upstream license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) when redistributing model weights.
253
+
254
+ ---
255
+
256
+ *Created by [Sajad (neuralbroker)](https://github.com/neuralbroker)*