Instructions to use mmrech/Minimalism with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mmrech/Minimalism with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mmrech/Minimalism") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use mmrech/Minimalism with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "mmrech/Minimalism" --prompt "Once upon a time"
| language: | |
| - en | |
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct | |
| tags: | |
| - code | |
| - coding-assistant | |
| - lora | |
| - mlx | |
| - apple-silicon | |
| - qwen2.5 | |
| datasets: | |
| - flwrlabs/code-alpaca-20k | |
| - m-a-p/Code-Feedback | |
| library_name: mlx-lm | |
| pipeline_tag: text-generation | |
| **Developed By Samiya Kashif, Kashif Salahuddin & Rohan Bhangale & Robert Rojek** | |
| ## 1. Executive Summary | |
| **Minimalism** is a specialized coding assistant built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, Minimalism implements a "runnable-first" philosophy: when users request code, responses are structured with clear **Solution**, **Usage**, and **Sanity test** sections, ensuring developers receive immediately executable code with minimal friction. | |
| ### What Minimalism Is | |
| - **A LoRA adapter** Trained on code-alpaca-20k dataset | |
| - **OpenAI-compatible API** for local inference | |
| - **Lightweight distribution** (~12MB adapter vs. multi-GB full models) | |
| - **Production-engineered** with automated pipelines, evaluation, and publishing | |
| ## Why Minimalism | |
| Minimalism is built for a simple, practical goal: **deliver the same outcome with fewer lines of code**. | |
| Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide. | |
| **Too Much Code, Too Fast** Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing **5 to 7 times more** than before. At first, it looks like higher productivity. In reality, it often means more bugs. | |
| There’s a long-standing rule in software engineering: | |
| > “The more lines of code you have, the higher your probability of introducing bugs.” | |
| The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be **verbose and repetitive**, which can inflate LOC without adding real value. | |
| Minimalism is designed for teams that value **minimalism, clarity, and correctness** over volume. | |
| ### What makes Minimalism different | |
| * **Minimal LoC by default** | |
| Minimalism is optimized to **minimize lines of code while preserving behavior**—it prefers the smallest correct solution that meets the user’s objective. | |
| * **Internal governance behavior** | |
| The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce **minimalism as a constraint**. It evaluates candidate solutions by measuring **lines of code** and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines **without sacrificing correctness**. | |
| * **Practical, runnable output** | |
| When you ask for code, Minimalism is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate. | |
| ### Early validation | |
| Minimalism was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, Minimalism showed a **clear reduction in lines of code (up to ~30%)** while producing solutions that **executed correctly and achieved the same intended outcomes** under the evaluation harness. | |
| > Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards. | |
| ### Why It Exists | |
| Developers need coding assistance that: | |
| 1. Provides **runnable code immediately** without extensive explanation | |
| 2. Runs **locally** without cloud dependencies | |
| 3. Maintains **small footprint** for fast iteration | |
| 4. Offers **structured, predictable responses** for automation | |
| ### Who It's For | |
| - **Individual developers** working on their individual projects. | |
| - **Small teams** needing local, private coding assistance | |
| - **Educators** teaching programming with consistent code examples | |
| - **Researchers** experimenting with LoRA fine-tuning on MLX | |
| ## 🔧 Technical Architecture | |
| ### Method 1 Pipeline (9 Steps) | |
| ``` | |
| 1. Receive Request | |
| ↓ | |
| 2. Derive Requirements + Tests | |
| ↓ | |
| 3. Generate N Candidates | |
| ↓ | |
| 4. Normalize Code | |
| ↓ | |
| 5. Score by LoC | |
| ↓ | |
| 6. Apply Quality Gates (G1-G5) | |
| ↓ | |
| 7. Select Minimal Passing | |
| ↓ | |
| 8. Optional Reduction Loop | |
| ↓ | |
| 9. Output + Audit | |
| ``` | |
| ### Quality Gates | |
| - **G1 Compile**: Python syntax validation | |
| - **G2 Constraints**: Dependency checking | |
| - **G3 Execution**: Sandbox smoke test (2s timeout) | |
| - **G4 Tests**: Acceptance test validation | |
| - **G5 Safety**: Dangerous operation detection | |
| ### Key Design Principles | |
| 1. **Text-based analysis** (no AST as required) | |
| 2. **Fail-fast validation** (stop on first gate failure) | |
| 3. **Sandbox isolation** (subprocess with timeout) | |
| 4. **Complete audit trail** (every decision logged) | |
| 5. **Pluggable architecture** (easy to extend) | |
|  | |
| --- | |
| ## ✅ Acceptance Criteria Verification | |
| ### Required Command ✅ | |
| ```bash | |
| python3 -m askbuddyx_gov.cli \ | |
| --prompt "Write a Python function that parses a JSON string and returns an empty dict on error" \ | |
| --n 3 \ | |
| --reduce-iter 1 | |
| ``` | |
| **Results:** | |
| - ✅ Produces output code file | |
| - ✅ Shows full step-by-step sequence | |
| - ✅ Selects minimal passing candidate | |
| - ✅ Generates audit.json with per-candidate results | |
| ### Pipeline Execution ✅ | |
| - ✅ All 9 steps execute in sequence | |
| - ✅ Proper logging at each step | |
| - ✅ 3 candidates generated (LoC: 19, 6, 2) | |
| - ✅ All candidates validated through gates | |
| - ✅ Minimal candidate selected (LoC=2) | |
| - ✅ Complete audit trail saved | |
| ### Quality Metrics ✅ | |
| - ✅ Selected candidate passes compile gate | |
| - ✅ Selected candidate has minimal LoC | |
| - ✅ Audit contains gate results for all candidates | |
| - ✅ All data properly structured | |
| --- | |
| ## Quick Start | |
| ### Option 1: Use with MLX | |
| Install MLX and load the model with adapter: | |
| ```bash | |
| pip install mlx-lm | |
| ``` | |
| ```python | |
| from mlx_lm import load, generate | |
| # Load base model with Minimalism adapter | |
| model, tokenizer = load( | |
| "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit", | |
| adapter_path="salakash/Minimalism" | |
| ) | |
| # Generate code | |
| prompt = "Write a Python function to calculate factorial" | |
| response = generate(model, tokenizer, prompt=prompt, max_tokens=512) | |
| print(response) | |
| ``` | |
| ### Option 2: Use with Transformers | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from peft import PeftModel | |
| # Load base model | |
| base_model = AutoModelForCausalLM.from_pretrained( | |
| "Qwen/Qwen2.5-Coder-0.5B-Instruct", | |
| trust_remote_code=True | |
| ) | |
| # Load adapter | |
| model = PeftModel.from_pretrained(base_model, "salakash/Minimalism") | |
| tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") | |
| # Generate | |
| messages = [{"role": "user", "content": "Write a Python function to add two numbers"}] | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(text, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=256) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ### Option 3: Web UI with MLX | |
| Start an OpenAI-compatible server: | |
| ```bash | |
| # Install mlx-lm if not already installed | |
| pip install mlx-lm | |
| # Start server with adapter | |
| mlx_lm.server \ | |
| --model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \ | |
| --adapter-path salakash/Minimalism \ | |
| --port 8080 | |
| ``` | |
| Then use with any OpenAI-compatible client: | |
| ```bash | |
| curl http://localhost:8080/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit", | |
| "messages": [ | |
| {"role": "user", "content": "Write a Python function to reverse a string"} | |
| ], | |
| "max_tokens": 512 | |
| }' | |
| ``` | |
| Or use with any OpenAI-compatible web UI like: | |
| - [Open WebUI](https://github.com/open-webui/open-webui) | |
| - [LibreChat](https://github.com/danny-avila/LibreChat) | |
| - [ChatGPT-Next-Web](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) | |
| Configure the UI to point to `http://localhost:8080` as the API endpoint. | |
|  | |
| ### Option 4: Hugging Face Inference API | |
| Use directly via Hugging Face's Inference API (requires HF token): | |
| ```python | |
| import requests | |
| API_URL = "https://api-inference.huggingface.co/models/salakash/Minimalism" | |
| headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} | |
| def query(payload): | |
| response = requests.post(API_URL, headers=headers, json=payload) | |
| return response.json() | |
| output = query({ | |
| "inputs": "Write a Python function to check if a number is prime", | |
| "parameters": {"max_new_tokens": 256} | |
| }) | |
| print(output) | |
| ``` | |
| ## Response Format | |
| Minimalism provides structured, runnable-first responses: | |
| - **Solution**: The main implementation code | |
| - **Usage**: A minimal runnable example | |
| - **Sanity test**: A tiny test snippet (when appropriate) | |
| ## Comparison | |
| Minimalism achieved the same objective in **~8-10 lines of code**, while a standard LLM typically produced **22–26 lines** for the equivalent solution. | |
| ### Minimalism | |
|  | |
| ### Standard Coding Agent | |
|  | |
| ## Documentation | |
| For comprehensive technical details, see: | |
| - **[PYTHON_DEVELOPMENT_GUIDE.md](PYTHON_DEVELOPMENT_GUIDE.md)**: Complete Python guide covering all concepts, libraries, and techniques used in the project | |
| - **[ARCHITECTURE.md](ARCHITECTURE.md)**: Complete system architecture, building blocks, epics & stories, technical stack, and design decisions | |
| - **[HUGGINGFACE_UPLOAD_GUIDE.md](HUGGINGFACE_UPLOAD_GUIDE.md)**: Step-by-step guide for uploading to HuggingFace Hub | |
| - **[MODEL_CARD.md](MODEL_CARD.md)**: Model details, training configuration, and usage guidelines | |
| - **[QUICK_RUN_GUIDE.md](QUICK_RUN_GUIDE.md)**: Quick start guide for getting up and running | |
| ## Base Model & Dataset | |
| - **Base Model**: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) | |
| - **MLX Weights**: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit) | |
| - **Dataset**: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k) | |
| - **Dataset**: [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback) | |
| ## License | |
| This project publishes only adapter artifacts and configuration. The base model and dataset have their own licenses: | |
| - Base Model: Apache-2.0 (Qwen/Qwen2.5-Coder-0.5B-Instruct) | |
| - Dataset: Apache-2.0 (flwrlabs/code-alpaca-20k) | |
| See `LICENSE-THIRD-PARTY.md` for complete attribution. | |
| ## Acknowledgments | |
| - Qwen team for the excellent base model | |
| - MLX community for the Apple Silicon optimizations | |
| - flwrlabs for the code-alpaca-20k dataset | |
| - Multimodel Art Projection for m-a-p/Code-Feedback |