|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct |
|
|
tags: |
|
|
- code |
|
|
- coding-assistant |
|
|
- lora |
|
|
- mlx |
|
|
- apple-silicon |
|
|
- qwen2.5 |
|
|
datasets: |
|
|
- flwrlabs/code-alpaca-20k |
|
|
- m-a-p/Code-Feedback |
|
|
library_name: mlx-lm |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
**Developed By Kashif Salahuddin & Samiya Kashif** |
|
|
## 1. Executive Summary |
|
|
|
|
|
**AskBuddyX** is a specialized coding assistant (MVP version) built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, AskBuddyX implements a "runnable-first" philosophy: when users request code, responses are structured with clear **Solution**, **Usage**, and **Sanity test** sections, ensuring developers receive immediately executable code with minimal friction. It uses the philosphy of **minimize lines of code while preserving behavior** |
|
|
|
|
|
### What AskBuddyX Is |
|
|
|
|
|
- **A LoRA adapter** Trained on code-alpaca-20k dataset |
|
|
- **OpenAI-compatible API** for local inference |
|
|
- **Lightweight distribution** (~12MB adapter vs. multi-GB full models) |
|
|
- **Production-engineered** with automated pipelines, evaluation, and publishing |
|
|
|
|
|
## Why AskBuddyX |
|
|
|
|
|
AskBuddyX is built for a simple, practical goal: **deliver the same outcome with fewer lines of code**. |
|
|
|
|
|
Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide. |
|
|
|
|
|
**Too Much Code, Too Fast** Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing **5 to 7 times more** than before. At first, it looks like higher productivity. In reality, it often means more bugs. |
|
|
|
|
|
There’s a long-standing rule in software engineering: |
|
|
|
|
|
> “The more lines of code you have, the higher your probability of introducing bugs.” |
|
|
|
|
|
The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be **verbose and repetitive**, which can inflate LOC without adding real value. |
|
|
|
|
|
AskBuddyX is designed for teams that value **minimalism, clarity, and correctness** over volume. |
|
|
|
|
|
|
|
|
### What makes AskBuddyX different |
|
|
|
|
|
* **Minimal LoC by default** |
|
|
AskBuddyX is optimized to **minimize lines of code while preserving behavior**—it prefers the smallest correct solution that meets the user’s objective. |
|
|
|
|
|
* **Internal governance behavior** |
|
|
The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce **minimalism as a constraint**. It evaluates candidate solutions by measuring **lines of code** and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines **without sacrificing correctness**. |
|
|
|
|
|
* **Practical, runnable output** |
|
|
When you ask for code, AskBuddyX is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate. |
|
|
|
|
|
### Early validation |
|
|
|
|
|
AskBuddyX was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, AskBuddyX showed a **clear reduction in lines of code (up to ~30%)** while producing solutions that **executed correctly and achieved the same intended outcomes** under the evaluation harness. |
|
|
|
|
|
> Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards. |
|
|
|
|
|
|
|
|
|
|
|
### Why It Exists |
|
|
|
|
|
Developers need coding assistance that: |
|
|
1. Provides **runnable code immediately** without extensive explanation |
|
|
2. Runs **locally** without cloud dependencies |
|
|
3. Maintains **small footprint** for fast iteration |
|
|
4. Offers **structured, predictable responses** for automation |
|
|
|
|
|
### Who It's For |
|
|
|
|
|
- **Individual developers** working on their individual projects. |
|
|
- **Small teams** needing local, private coding assistance |
|
|
- **Educators** teaching programming with consistent code examples |
|
|
- **Researchers** experimenting with LoRA fine-tuning on MLX |
|
|
|
|
|
|
|
|
### Overview |
|
|
|
|
|
The present MVP provides a **System and Method for Execution-Validated Minimal Code Generation with Multi-Gate Quality Assurance** |
|
|
|
|
|
### Key Innovations |
|
|
|
|
|
**1. Multi-Gate Quality Assurance Pipeline (G1-G5)** |
|
|
- **G1 (Compilation Gate)**: Syntax and compilation validation |
|
|
- **G2 (Constraint Gate)**: Requirement satisfaction verification |
|
|
- **G3 (Execution Gate)**: Runtime correctness validation |
|
|
- **G4 (Test Gate)**: Derived test case validation |
|
|
- **G5 (Safety Gate)**: Security and safety verification |
|
|
|
|
|
**2. Correctness-First Minimization** |
|
|
- Generates N candidate implementations |
|
|
- Validates all candidates through quality gates |
|
|
- Selects minimal LoC solution among valid candidates |
|
|
- Ensures correctness is never sacrificed for brevity |
|
|
|
|
|
**3. Iterative Reduction with Bounded Repair** |
|
|
- Attempts safe code reduction through M iterations |
|
|
- Re-validates after each reduction attempt |
|
|
- Converges to local minimum while maintaining correctness |
|
|
- Prevents infinite loops through iteration bounds |
|
|
|
|
|
**4. Text-Based Code Analysis** |
|
|
- Language-agnostic string manipulation |
|
|
- No AST parsing dependencies |
|
|
- Portable across programming languages |
|
|
- Efficient line-counting and normalization |
|
|
|
|
|
**5. Sandboxed Execution Environment** |
|
|
- Subprocess isolation with timeout controls |
|
|
- Resource limitation mechanisms |
|
|
- Security boundary enforcement |
|
|
- Prevents malicious code execution |
|
|
|
|
|
**6. Comprehensive Audit Logging** |
|
|
- JSON-structured audit trails |
|
|
- Complete candidate history tracking |
|
|
- Gate result documentation |
|
|
- Regulatory compliance support |
|
|
|
|
|
 |
|
|
|
|
|
### Technical Advantages |
|
|
|
|
|
1. **Guaranteed Correctness**: Multi-gate validation ensures only correct code is selected |
|
|
2. **Minimal Complexity**: Systematic LoC minimization reduces maintenance burden |
|
|
3. **Security**: Sandboxed execution prevents malicious code risks |
|
|
4. **Traceability**: Complete audit logs enable production deployment decisions |
|
|
5. **Portability**: Text-based analysis works across programming languages |
|
|
6. **Efficiency**: Parallel candidate evaluation with early rejection |
|
|
7. **Convergence**: Bounded iterations prevent infinite reduction loops |
|
|
8. **Transparency**: Full visibility into generation and validation process |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Option 1: Use with MLX |
|
|
|
|
|
Install MLX and load the model with adapter: |
|
|
|
|
|
```bash |
|
|
pip install mlx-lm |
|
|
``` |
|
|
|
|
|
```python |
|
|
from mlx_lm import load, generate |
|
|
|
|
|
# Load base model with AskBuddyX adapter |
|
|
model, tokenizer = load( |
|
|
"mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit", |
|
|
adapter_path="salakash/AskBuddyX" |
|
|
) |
|
|
|
|
|
# Generate code |
|
|
prompt = "Write a Python function to calculate factorial" |
|
|
response = generate(model, tokenizer, prompt=prompt, max_tokens=512) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Option 2: Use with Transformers |
|
|
|
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"Qwen/Qwen2.5-Coder-0.5B-Instruct", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Load adapter |
|
|
model = PeftModel.from_pretrained(base_model, "salakash/AskBuddyX") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") |
|
|
|
|
|
# Generate |
|
|
messages = [{"role": "user", "content": "Write a Python function to add two numbers"}] |
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Option 3: Web UI with MLX |
|
|
|
|
|
Start an OpenAI-compatible server: |
|
|
|
|
|
```bash |
|
|
# Install mlx-lm if not already installed |
|
|
pip install mlx-lm |
|
|
|
|
|
# Start server with adapter |
|
|
mlx_lm.server \ |
|
|
--model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \ |
|
|
--adapter-path salakash/AskBuddyX \ |
|
|
--port 8080 |
|
|
``` |
|
|
|
|
|
Then use with any OpenAI-compatible client: |
|
|
|
|
|
```bash |
|
|
curl http://localhost:8080/v1/chat/completions \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit", |
|
|
"messages": [ |
|
|
{"role": "user", "content": "Write a Python function to reverse a string"} |
|
|
], |
|
|
"max_tokens": 512 |
|
|
}' |
|
|
``` |
|
|
|
|
|
Or use with any OpenAI-compatible web UI like: |
|
|
- [Open WebUI](https://github.com/open-webui/open-webui) |
|
|
- [LibreChat](https://github.com/danny-avila/LibreChat) |
|
|
- [ChatGPT-Next-Web](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) |
|
|
|
|
|
Configure the UI to point to `http://localhost:8080` as the API endpoint. |
|
|
|
|
|
### Option 4: Hugging Face Inference API |
|
|
|
|
|
Use directly via Hugging Face's Inference API (requires HF token): |
|
|
|
|
|
```python |
|
|
import requests |
|
|
|
|
|
API_URL = "https://api-inference.huggingface.co/models/salakash/AskBuddyX" |
|
|
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} |
|
|
|
|
|
def query(payload): |
|
|
response = requests.post(API_URL, headers=headers, json=payload) |
|
|
return response.json() |
|
|
|
|
|
output = query({ |
|
|
"inputs": "Write a Python function to check if a number is prime", |
|
|
"parameters": {"max_new_tokens": 256} |
|
|
}) |
|
|
print(output) |
|
|
``` |
|
|
|
|
|
## Response Format |
|
|
|
|
|
AskBuddyX provides structured, runnable-first responses: |
|
|
|
|
|
- **Solution**: The main implementation code |
|
|
- **Usage**: A minimal runnable example |
|
|
- **Sanity test**: A tiny test snippet (when appropriate) |
|
|
|
|
|
## Comparison |
|
|
AskBuddyX achieved the same objective in **~8-10 lines of code**, while a standard LLM typically produced **22–26 lines** for the equivalent solution. |
|
|
|
|
|
### AskBuddyX |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
### Standard Coding Agent |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
## Base Model & Dataset |
|
|
|
|
|
- **Base Model**: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) |
|
|
- **MLX Weights**: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit) |
|
|
- **Dataset**: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k) |
|
|
- **Dataset**: [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback) |
|
|
|
|
|
## License |
|
|
|
|
|
This project publishes only adapter artifacts and configuration. The base model and dataset have their own licenses: |
|
|
|
|
|
- Base Model: Apache-2.0 (Qwen/Qwen2.5-Coder-0.5B-Instruct) |
|
|
- Dataset: Apache-2.0 (flwrlabs/code-alpaca-20k) |
|
|
|
|
|
See `LICENSE-THIRD-PARTY.md` for complete attribution. |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Qwen team for the excellent base model. |
|
|
- MLX community for the Apple Silicon optimizations |
|
|
- flwrlabs for the code-alpaca-20k dataset |
|
|
- Multimodel Art Projection for m-a-p/Code-Feedback |