--- language: - en license: apache-2.0 base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct tags: - code - coding-assistant - lora - mlx - apple-silicon - qwen2.5 datasets: - flwrlabs/code-alpaca-20k - m-a-p/Code-Feedback library_name: mlx-lm pipeline_tag: text-generation --- **Developed By Kashif Salahuddin & Samiya Kashif** ## 1. Executive Summary **AskBuddyX** is a specialized coding assistant (MVP version) built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, AskBuddyX implements a "runnable-first" philosophy: when users request code, responses are structured with clear **Solution**, **Usage**, and **Sanity test** sections, ensuring developers receive immediately executable code with minimal friction. It uses the philosphy of **minimize lines of code while preserving behavior** ### What AskBuddyX Is - **A LoRA adapter** Trained on code-alpaca-20k dataset - **OpenAI-compatible API** for local inference - **Lightweight distribution** (~12MB adapter vs. multi-GB full models) - **Production-engineered** with automated pipelines, evaluation, and publishing ## Why AskBuddyX AskBuddyX is built for a simple, practical goal: **deliver the same outcome with fewer lines of code**. Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide. **Too Much Code, Too Fast** Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing **5 to 7 times more** than before. At first, it looks like higher productivity. In reality, it often means more bugs. There’s a long-standing rule in software engineering: > “The more lines of code you have, the higher your probability of introducing bugs.” The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be **verbose and repetitive**, which can inflate LOC without adding real value. AskBuddyX is designed for teams that value **minimalism, clarity, and correctness** over volume. ### What makes AskBuddyX different * **Minimal LoC by default** AskBuddyX is optimized to **minimize lines of code while preserving behavior**—it prefers the smallest correct solution that meets the user’s objective. * **Internal governance behavior** The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce **minimalism as a constraint**. It evaluates candidate solutions by measuring **lines of code** and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines **without sacrificing correctness**. * **Practical, runnable output** When you ask for code, AskBuddyX is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate. ### Early validation AskBuddyX was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, AskBuddyX showed a **clear reduction in lines of code (up to ~30%)** while producing solutions that **executed correctly and achieved the same intended outcomes** under the evaluation harness. > Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards. ### Why It Exists Developers need coding assistance that: 1. Provides **runnable code immediately** without extensive explanation 2. Runs **locally** without cloud dependencies 3. Maintains **small footprint** for fast iteration 4. Offers **structured, predictable responses** for automation ### Who It's For - **Individual developers** working on their individual projects. - **Small teams** needing local, private coding assistance - **Educators** teaching programming with consistent code examples - **Researchers** experimenting with LoRA fine-tuning on MLX ### Overview The present MVP provides a **System and Method for Execution-Validated Minimal Code Generation with Multi-Gate Quality Assurance** ### Key Innovations **1. Multi-Gate Quality Assurance Pipeline (G1-G5)** - **G1 (Compilation Gate)**: Syntax and compilation validation - **G2 (Constraint Gate)**: Requirement satisfaction verification - **G3 (Execution Gate)**: Runtime correctness validation - **G4 (Test Gate)**: Derived test case validation - **G5 (Safety Gate)**: Security and safety verification **2. Correctness-First Minimization** - Generates N candidate implementations - Validates all candidates through quality gates - Selects minimal LoC solution among valid candidates - Ensures correctness is never sacrificed for brevity **3. Iterative Reduction with Bounded Repair** - Attempts safe code reduction through M iterations - Re-validates after each reduction attempt - Converges to local minimum while maintaining correctness - Prevents infinite loops through iteration bounds **4. Text-Based Code Analysis** - Language-agnostic string manipulation - No AST parsing dependencies - Portable across programming languages - Efficient line-counting and normalization **5. Sandboxed Execution Environment** - Subprocess isolation with timeout controls - Resource limitation mechanisms - Security boundary enforcement - Prevents malicious code execution **6. Comprehensive Audit Logging** - JSON-structured audit trails - Complete candidate history tracking - Gate result documentation - Regulatory compliance support ![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/wpLNZjRomjUc4f7ClZ_is.png) ### Technical Advantages 1. **Guaranteed Correctness**: Multi-gate validation ensures only correct code is selected 2. **Minimal Complexity**: Systematic LoC minimization reduces maintenance burden 3. **Security**: Sandboxed execution prevents malicious code risks 4. **Traceability**: Complete audit logs enable production deployment decisions 5. **Portability**: Text-based analysis works across programming languages 6. **Efficiency**: Parallel candidate evaluation with early rejection 7. **Convergence**: Bounded iterations prevent infinite reduction loops 8. **Transparency**: Full visibility into generation and validation process ## Quick Start ### Option 1: Use with MLX Install MLX and load the model with adapter: ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate # Load base model with AskBuddyX adapter model, tokenizer = load( "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit", adapter_path="salakash/AskBuddyX" ) # Generate code prompt = "Write a Python function to calculate factorial" response = generate(model, tokenizer, prompt=prompt, max_tokens=512) print(response) ``` ### Option 2: Use with Transformers ```bash pip install transformers torch ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-Coder-0.5B-Instruct", trust_remote_code=True ) # Load adapter model = PeftModel.from_pretrained(base_model, "salakash/AskBuddyX") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") # Generate messages = [{"role": "user", "content": "Write a Python function to add two numbers"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Option 3: Web UI with MLX Start an OpenAI-compatible server: ```bash # Install mlx-lm if not already installed pip install mlx-lm # Start server with adapter mlx_lm.server \ --model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \ --adapter-path salakash/AskBuddyX \ --port 8080 ``` Then use with any OpenAI-compatible client: ```bash curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit", "messages": [ {"role": "user", "content": "Write a Python function to reverse a string"} ], "max_tokens": 512 }' ``` Or use with any OpenAI-compatible web UI like: - [Open WebUI](https://github.com/open-webui/open-webui) - [LibreChat](https://github.com/danny-avila/LibreChat) - [ChatGPT-Next-Web](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) Configure the UI to point to `http://localhost:8080` as the API endpoint. ### Option 4: Hugging Face Inference API Use directly via Hugging Face's Inference API (requires HF token): ```python import requests API_URL = "https://api-inference.huggingface.co/models/salakash/AskBuddyX" headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query({ "inputs": "Write a Python function to check if a number is prime", "parameters": {"max_new_tokens": 256} }) print(output) ``` ## Response Format AskBuddyX provides structured, runnable-first responses: - **Solution**: The main implementation code - **Usage**: A minimal runnable example - **Sanity test**: A tiny test snippet (when appropriate) ## Comparison AskBuddyX achieved the same objective in **~8-10 lines of code**, while a standard LLM typically produced **22–26 lines** for the equivalent solution. ### AskBuddyX ![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/vDk8E83n43c7FRti-QE5j.png) ![alt text](image-1.png) ### Standard Coding Agent ![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/5hl6r2GDSjkXh1e5dYlom.png) ![alt text](image.png) ## Base Model & Dataset - **Base Model**: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) - **MLX Weights**: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit) - **Dataset**: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k) - **Dataset**: [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback) ## License This project publishes only adapter artifacts and configuration. The base model and dataset have their own licenses: - Base Model: Apache-2.0 (Qwen/Qwen2.5-Coder-0.5B-Instruct) - Dataset: Apache-2.0 (flwrlabs/code-alpaca-20k) See `LICENSE-THIRD-PARTY.md` for complete attribution. ## Acknowledgments - Qwen team for the excellent base model. - MLX community for the Apple Silicon optimizations - flwrlabs for the code-alpaca-20k dataset - Multimodel Art Projection for m-a-p/Code-Feedback