File size: 11,387 Bytes
6edcce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cdbeb2b
6edcce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5a527bc
27527c8
5a527bc
6edcce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27527c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6edcce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09e2978
 
 
6edcce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
tags:
- code
- coding-assistant
- lora
- mlx
- apple-silicon
- qwen2.5
datasets:
- flwrlabs/code-alpaca-20k
- m-a-p/Code-Feedback
library_name: mlx-lm
pipeline_tag: text-generation
---
**Developed By Samiya Kashif, Kashif Salahuddin & Rohan Bhangale & Robert Rojek**
## 1. Executive Summary

**Minimalism** is a specialized coding assistant built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, Minimalism implements a "runnable-first" philosophy: when users request code, responses are structured with clear **Solution**, **Usage**, and **Sanity test** sections, ensuring developers receive immediately executable code with minimal friction.

### What Minimalism Is

- **A LoRA adapter** Trained on code-alpaca-20k dataset
- **OpenAI-compatible API** for local inference
- **Lightweight distribution** (~12MB adapter vs. multi-GB full models)
- **Production-engineered** with automated pipelines, evaluation, and publishing

## Why Minimalism

Minimalism is built for a simple, practical goal: **deliver the same outcome with fewer lines of code**.

Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide. 

**Too Much Code, Too Fast** Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing **5 to 7 times more** than before. At first, it looks like higher productivity. In reality, it often means more bugs.

There’s a long-standing rule in software engineering:

> “The more lines of code you have, the higher your probability of introducing bugs.”

The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be **verbose and repetitive**, which can inflate LOC without adding real value.

Minimalism is designed for teams that value **minimalism, clarity, and correctness** over volume.


### What makes Minimalism different

* **Minimal LoC by default**
  Minimalism is optimized to **minimize lines of code while preserving behavior**—it prefers the smallest correct solution that meets the user’s objective. 

* **Internal governance behavior**
  The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce **minimalism as a constraint**. It evaluates candidate solutions by measuring **lines of code** and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines **without sacrificing correctness**.

* **Practical, runnable output**
  When you ask for code, Minimalism is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate.




### Early validation

Minimalism was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, Minimalism showed a **clear reduction in lines of code (up to ~30%)** while producing solutions that **executed correctly and achieved the same intended outcomes** under the evaluation harness.

> Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards.



### Why It Exists

Developers need coding assistance that:
1. Provides **runnable code immediately** without extensive explanation
2. Runs **locally** without cloud dependencies
3. Maintains **small footprint** for fast iteration
4. Offers **structured, predictable responses** for automation

### Who It's For

- **Individual developers** working on their individual projects.
- **Small teams** needing local, private coding assistance
- **Educators** teaching programming with consistent code examples
- **Researchers** experimenting with LoRA fine-tuning on MLX

## 🔧 Technical Architecture

### Method 1 Pipeline (9 Steps)

```
1. Receive Request

2. Derive Requirements + Tests

3. Generate N Candidates

4. Normalize Code

5. Score by LoC

6. Apply Quality Gates (G1-G5)

7. Select Minimal Passing

8. Optional Reduction Loop

9. Output + Audit
```

### Quality Gates

- **G1 Compile**: Python syntax validation
- **G2 Constraints**: Dependency checking
- **G3 Execution**: Sandbox smoke test (2s timeout)
- **G4 Tests**: Acceptance test validation
- **G5 Safety**: Dangerous operation detection

### Key Design Principles

1. **Text-based analysis** (no AST as required)
2. **Fail-fast validation** (stop on first gate failure)
3. **Sandbox isolation** (subprocess with timeout)
4. **Complete audit trail** (every decision logged)
5. **Pluggable architecture** (easy to extend)

![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/fHRjpLh5Wy5s5sB26y3U9.png)
---

## ✅ Acceptance Criteria Verification

### Required Command ✅
```bash
python3 -m askbuddyx_gov.cli \
  --prompt "Write a Python function that parses a JSON string and returns an empty dict on error" \
  --n 3 \
  --reduce-iter 1
```

**Results:**
- ✅ Produces output code file
- ✅ Shows full step-by-step sequence
- ✅ Selects minimal passing candidate
- ✅ Generates audit.json with per-candidate results

### Pipeline Execution ✅
- ✅ All 9 steps execute in sequence
- ✅ Proper logging at each step
- ✅ 3 candidates generated (LoC: 19, 6, 2)
- ✅ All candidates validated through gates
- ✅ Minimal candidate selected (LoC=2)
- ✅ Complete audit trail saved

### Quality Metrics ✅
- ✅ Selected candidate passes compile gate
- ✅ Selected candidate has minimal LoC
- ✅ Audit contains gate results for all candidates
- ✅ All data properly structured

---


## Quick Start

### Option 1: Use with MLX 

Install MLX and load the model with adapter:

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

# Load base model with Minimalism adapter
model, tokenizer = load(
    "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
    adapter_path="salakash/Minimalism"
)

# Generate code
prompt = "Write a Python function to calculate factorial"
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)
```

### Option 2: Use with Transformers

```bash
pip install transformers torch
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-0.5B-Instruct",
    trust_remote_code=True
)

# Load adapter
model = PeftModel.from_pretrained(base_model, "salakash/Minimalism")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

# Generate
messages = [{"role": "user", "content": "Write a Python function to add two numbers"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Option 3: Web UI with MLX

Start an OpenAI-compatible server:

```bash
# Install mlx-lm if not already installed
pip install mlx-lm

# Start server with adapter
mlx_lm.server \
  --model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
  --adapter-path salakash/Minimalism \
  --port 8080
```

Then use with any OpenAI-compatible client:

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
    "messages": [
      {"role": "user", "content": "Write a Python function to reverse a string"}
    ],
    "max_tokens": 512
  }'
```

Or use with any OpenAI-compatible web UI like:
- [Open WebUI](https://github.com/open-webui/open-webui)
- [LibreChat](https://github.com/danny-avila/LibreChat)
- [ChatGPT-Next-Web](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web)

Configure the UI to point to `http://localhost:8080` as the API endpoint.


![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/69yp9ZE2JPQJ9ZZYB2i9C.png)

### Option 4: Hugging Face Inference API

Use directly via Hugging Face's Inference API (requires HF token):

```python
import requests

API_URL = "https://api-inference.huggingface.co/models/salakash/Minimalism"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "Write a Python function to check if a number is prime",
    "parameters": {"max_new_tokens": 256}
})
print(output)
```

## Response Format

Minimalism provides structured, runnable-first responses:

- **Solution**: The main implementation code
- **Usage**: A minimal runnable example
- **Sanity test**: A tiny test snippet (when appropriate)

## Comparison
Minimalism achieved the same objective in **~8-10 lines of code**, while a standard LLM typically produced **22–26 lines** for the equivalent solution.

### Minimalism

![alt text](image-1.png)

### Standard Coding Agent

![alt text](image.png)

## Documentation

For comprehensive technical details, see:
- **[PYTHON_DEVELOPMENT_GUIDE.md](PYTHON_DEVELOPMENT_GUIDE.md)**: Complete Python guide covering all concepts, libraries, and techniques used in the project
- **[ARCHITECTURE.md](ARCHITECTURE.md)**: Complete system architecture, building blocks, epics & stories, technical stack, and design decisions
- **[HUGGINGFACE_UPLOAD_GUIDE.md](HUGGINGFACE_UPLOAD_GUIDE.md)**: Step-by-step guide for uploading to HuggingFace Hub
- **[MODEL_CARD.md](MODEL_CARD.md)**: Model details, training configuration, and usage guidelines
- **[QUICK_RUN_GUIDE.md](QUICK_RUN_GUIDE.md)**: Quick start guide for getting up and running

## Base Model & Dataset

- **Base Model**: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)
- **MLX Weights**: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit)
- **Dataset**: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k)
- **Dataset**: [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback)

## License

This project publishes only adapter artifacts and configuration. The base model and dataset have their own licenses:

- Base Model: Apache-2.0 (Qwen/Qwen2.5-Coder-0.5B-Instruct)
- Dataset: Apache-2.0 (flwrlabs/code-alpaca-20k)

See `LICENSE-THIRD-PARTY.md` for complete attribution.

## Acknowledgments

- Qwen team for the excellent base model
- MLX community for the Apple Silicon optimizations
- flwrlabs for the code-alpaca-20k dataset
- Multimodel Art Projection for m-a-p/Code-Feedback