File size: 11,387 Bytes
6edcce8 cdbeb2b 6edcce8 5a527bc 27527c8 5a527bc 6edcce8 27527c8 6edcce8 09e2978 6edcce8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | ---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
tags:
- code
- coding-assistant
- lora
- mlx
- apple-silicon
- qwen2.5
datasets:
- flwrlabs/code-alpaca-20k
- m-a-p/Code-Feedback
library_name: mlx-lm
pipeline_tag: text-generation
---
**Developed By Samiya Kashif, Kashif Salahuddin & Rohan Bhangale & Robert Rojek**
## 1. Executive Summary
**Minimalism** is a specialized coding assistant built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, Minimalism implements a "runnable-first" philosophy: when users request code, responses are structured with clear **Solution**, **Usage**, and **Sanity test** sections, ensuring developers receive immediately executable code with minimal friction.
### What Minimalism Is
- **A LoRA adapter** Trained on code-alpaca-20k dataset
- **OpenAI-compatible API** for local inference
- **Lightweight distribution** (~12MB adapter vs. multi-GB full models)
- **Production-engineered** with automated pipelines, evaluation, and publishing
## Why Minimalism
Minimalism is built for a simple, practical goal: **deliver the same outcome with fewer lines of code**.
Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide.
**Too Much Code, Too Fast** Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing **5 to 7 times more** than before. At first, it looks like higher productivity. In reality, it often means more bugs.
There’s a long-standing rule in software engineering:
> “The more lines of code you have, the higher your probability of introducing bugs.”
The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be **verbose and repetitive**, which can inflate LOC without adding real value.
Minimalism is designed for teams that value **minimalism, clarity, and correctness** over volume.
### What makes Minimalism different
* **Minimal LoC by default**
Minimalism is optimized to **minimize lines of code while preserving behavior**—it prefers the smallest correct solution that meets the user’s objective.
* **Internal governance behavior**
The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce **minimalism as a constraint**. It evaluates candidate solutions by measuring **lines of code** and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines **without sacrificing correctness**.
* **Practical, runnable output**
When you ask for code, Minimalism is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate.
### Early validation
Minimalism was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, Minimalism showed a **clear reduction in lines of code (up to ~30%)** while producing solutions that **executed correctly and achieved the same intended outcomes** under the evaluation harness.
> Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards.
### Why It Exists
Developers need coding assistance that:
1. Provides **runnable code immediately** without extensive explanation
2. Runs **locally** without cloud dependencies
3. Maintains **small footprint** for fast iteration
4. Offers **structured, predictable responses** for automation
### Who It's For
- **Individual developers** working on their individual projects.
- **Small teams** needing local, private coding assistance
- **Educators** teaching programming with consistent code examples
- **Researchers** experimenting with LoRA fine-tuning on MLX
## 🔧 Technical Architecture
### Method 1 Pipeline (9 Steps)
```
1. Receive Request
↓
2. Derive Requirements + Tests
↓
3. Generate N Candidates
↓
4. Normalize Code
↓
5. Score by LoC
↓
6. Apply Quality Gates (G1-G5)
↓
7. Select Minimal Passing
↓
8. Optional Reduction Loop
↓
9. Output + Audit
```
### Quality Gates
- **G1 Compile**: Python syntax validation
- **G2 Constraints**: Dependency checking
- **G3 Execution**: Sandbox smoke test (2s timeout)
- **G4 Tests**: Acceptance test validation
- **G5 Safety**: Dangerous operation detection
### Key Design Principles
1. **Text-based analysis** (no AST as required)
2. **Fail-fast validation** (stop on first gate failure)
3. **Sandbox isolation** (subprocess with timeout)
4. **Complete audit trail** (every decision logged)
5. **Pluggable architecture** (easy to extend)

---
## ✅ Acceptance Criteria Verification
### Required Command ✅
```bash
python3 -m askbuddyx_gov.cli \
--prompt "Write a Python function that parses a JSON string and returns an empty dict on error" \
--n 3 \
--reduce-iter 1
```
**Results:**
- ✅ Produces output code file
- ✅ Shows full step-by-step sequence
- ✅ Selects minimal passing candidate
- ✅ Generates audit.json with per-candidate results
### Pipeline Execution ✅
- ✅ All 9 steps execute in sequence
- ✅ Proper logging at each step
- ✅ 3 candidates generated (LoC: 19, 6, 2)
- ✅ All candidates validated through gates
- ✅ Minimal candidate selected (LoC=2)
- ✅ Complete audit trail saved
### Quality Metrics ✅
- ✅ Selected candidate passes compile gate
- ✅ Selected candidate has minimal LoC
- ✅ Audit contains gate results for all candidates
- ✅ All data properly structured
---
## Quick Start
### Option 1: Use with MLX
Install MLX and load the model with adapter:
```bash
pip install mlx-lm
```
```python
from mlx_lm import load, generate
# Load base model with Minimalism adapter
model, tokenizer = load(
"mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
adapter_path="salakash/Minimalism"
)
# Generate code
prompt = "Write a Python function to calculate factorial"
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)
```
### Option 2: Use with Transformers
```bash
pip install transformers torch
```
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-0.5B-Instruct",
trust_remote_code=True
)
# Load adapter
model = PeftModel.from_pretrained(base_model, "salakash/Minimalism")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Generate
messages = [{"role": "user", "content": "Write a Python function to add two numbers"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Option 3: Web UI with MLX
Start an OpenAI-compatible server:
```bash
# Install mlx-lm if not already installed
pip install mlx-lm
# Start server with adapter
mlx_lm.server \
--model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
--adapter-path salakash/Minimalism \
--port 8080
```
Then use with any OpenAI-compatible client:
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
"messages": [
{"role": "user", "content": "Write a Python function to reverse a string"}
],
"max_tokens": 512
}'
```
Or use with any OpenAI-compatible web UI like:
- [Open WebUI](https://github.com/open-webui/open-webui)
- [LibreChat](https://github.com/danny-avila/LibreChat)
- [ChatGPT-Next-Web](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web)
Configure the UI to point to `http://localhost:8080` as the API endpoint.

### Option 4: Hugging Face Inference API
Use directly via Hugging Face's Inference API (requires HF token):
```python
import requests
API_URL = "https://api-inference.huggingface.co/models/salakash/Minimalism"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Write a Python function to check if a number is prime",
"parameters": {"max_new_tokens": 256}
})
print(output)
```
## Response Format
Minimalism provides structured, runnable-first responses:
- **Solution**: The main implementation code
- **Usage**: A minimal runnable example
- **Sanity test**: A tiny test snippet (when appropriate)
## Comparison
Minimalism achieved the same objective in **~8-10 lines of code**, while a standard LLM typically produced **22–26 lines** for the equivalent solution.
### Minimalism

### Standard Coding Agent

## Documentation
For comprehensive technical details, see:
- **[PYTHON_DEVELOPMENT_GUIDE.md](PYTHON_DEVELOPMENT_GUIDE.md)**: Complete Python guide covering all concepts, libraries, and techniques used in the project
- **[ARCHITECTURE.md](ARCHITECTURE.md)**: Complete system architecture, building blocks, epics & stories, technical stack, and design decisions
- **[HUGGINGFACE_UPLOAD_GUIDE.md](HUGGINGFACE_UPLOAD_GUIDE.md)**: Step-by-step guide for uploading to HuggingFace Hub
- **[MODEL_CARD.md](MODEL_CARD.md)**: Model details, training configuration, and usage guidelines
- **[QUICK_RUN_GUIDE.md](QUICK_RUN_GUIDE.md)**: Quick start guide for getting up and running
## Base Model & Dataset
- **Base Model**: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)
- **MLX Weights**: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit)
- **Dataset**: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k)
- **Dataset**: [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback)
## License
This project publishes only adapter artifacts and configuration. The base model and dataset have their own licenses:
- Base Model: Apache-2.0 (Qwen/Qwen2.5-Coder-0.5B-Instruct)
- Dataset: Apache-2.0 (flwrlabs/code-alpaca-20k)
See `LICENSE-THIRD-PARTY.md` for complete attribution.
## Acknowledgments
- Qwen team for the excellent base model
- MLX community for the Apple Silicon optimizations
- flwrlabs for the code-alpaca-20k dataset
- Multimodel Art Projection for m-a-p/Code-Feedback |