AskBuddyX / README.md

Update README.md

744ec6c verified 7 days ago

11.3 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
	tags:
	- code
	- coding-assistant
	- lora
	- mlx
	- apple-silicon
	- qwen2.5
	datasets:
	- flwrlabs/code-alpaca-20k
	- m-a-p/Code-Feedback
	library_name: mlx-lm
	pipeline_tag: text-generation
	---
	Developed By Kashif Salahuddin & Samiya Kashif
	## 1. Executive Summary

	AskBuddyX is a specialized coding assistant (MVP version) built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, AskBuddyX implements a "runnable-first" philosophy: when users request code, responses are structured with clear Solution, Usage, and Sanity test sections, ensuring developers receive immediately executable code with minimal friction. It uses the philosphy of minimize lines of code while preserving behavior

	### What AskBuddyX Is

	- A LoRA adapter Trained on code-alpaca-20k dataset
	- OpenAI-compatible API for local inference
	- Lightweight distribution (~12MB adapter vs. multi-GB full models)
	- Production-engineered with automated pipelines, evaluation, and publishing

	## Why AskBuddyX

	AskBuddyX is built for a simple, practical goal: deliver the same outcome with fewer lines of code.

	Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide.

	Too Much Code, Too Fast Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing 5 to 7 times more than before. At first, it looks like higher productivity. In reality, it often means more bugs.

	There’s a long-standing rule in software engineering:

	> “The more lines of code you have, the higher your probability of introducing bugs.”

	The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be verbose and repetitive, which can inflate LOC without adding real value.

	AskBuddyX is designed for teams that value minimalism, clarity, and correctness over volume.


	### What makes AskBuddyX different

	* Minimal LoC by default
	AskBuddyX is optimized to minimize lines of code while preserving behavior—it prefers the smallest correct solution that meets the user’s objective.

	* Internal governance behavior
	The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce minimalism as a constraint. It evaluates candidate solutions by measuring lines of code and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines without sacrificing correctness.

	* Practical, runnable output
	When you ask for code, AskBuddyX is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate.

	### Early validation

	AskBuddyX was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, AskBuddyX showed a clear reduction in lines of code (up to ~30%) while producing solutions that executed correctly and achieved the same intended outcomes under the evaluation harness.

	> Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards.



	### Why It Exists

	Developers need coding assistance that:
	1. Provides runnable code immediately without extensive explanation
	2. Runs locally without cloud dependencies
	3. Maintains small footprint for fast iteration
	4. Offers structured, predictable responses for automation

	### Who It's For

	- Individual developers working on their individual projects.
	- Small teams needing local, private coding assistance
	- Educators teaching programming with consistent code examples
	- Researchers experimenting with LoRA fine-tuning on MLX


	### Overview

	The present MVP provides a System and Method for Execution-Validated Minimal Code Generation with Multi-Gate Quality Assurance

	### Key Innovations

	1. Multi-Gate Quality Assurance Pipeline (G1-G5)
	- G1 (Compilation Gate): Syntax and compilation validation
	- G2 (Constraint Gate): Requirement satisfaction verification
	- G3 (Execution Gate): Runtime correctness validation
	- G4 (Test Gate): Derived test case validation
	- G5 (Safety Gate): Security and safety verification

	2. Correctness-First Minimization
	- Generates N candidate implementations
	- Validates all candidates through quality gates
	- Selects minimal LoC solution among valid candidates
	- Ensures correctness is never sacrificed for brevity

	3. Iterative Reduction with Bounded Repair
	- Attempts safe code reduction through M iterations
	- Re-validates after each reduction attempt
	- Converges to local minimum while maintaining correctness
	- Prevents infinite loops through iteration bounds

	4. Text-Based Code Analysis
	- Language-agnostic string manipulation
	- No AST parsing dependencies
	- Portable across programming languages
	- Efficient line-counting and normalization

	5. Sandboxed Execution Environment
	- Subprocess isolation with timeout controls
	- Resource limitation mechanisms
	- Security boundary enforcement
	- Prevents malicious code execution

	6. Comprehensive Audit Logging
	- JSON-structured audit trails
	- Complete candidate history tracking
	- Gate result documentation
	- Regulatory compliance support

	![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/wpLNZjRomjUc4f7ClZ_is.png)

	### Technical Advantages

	1. Guaranteed Correctness: Multi-gate validation ensures only correct code is selected
	2. Minimal Complexity: Systematic LoC minimization reduces maintenance burden
	3. Security: Sandboxed execution prevents malicious code risks
	4. Traceability: Complete audit logs enable production deployment decisions
	5. Portability: Text-based analysis works across programming languages
	6. Efficiency: Parallel candidate evaluation with early rejection
	7. Convergence: Bounded iterations prevent infinite reduction loops
	8. Transparency: Full visibility into generation and validation process

	## Quick Start

	### Option 1: Use with MLX

	Install MLX and load the model with adapter:

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	# Load base model with AskBuddyX adapter
	model, tokenizer = load(
	"mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
	adapter_path="salakash/AskBuddyX"
	)

	# Generate code
	prompt = "Write a Python function to calculate factorial"
	response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
	print(response)
	```

	### Option 2: Use with Transformers

	```bash
	pip install transformers torch
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-Coder-0.5B-Instruct",
	trust_remote_code=True
	)

	# Load adapter
	model = PeftModel.from_pretrained(base_model, "salakash/AskBuddyX")
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

	# Generate
	messages = [{"role": "user", "content": "Write a Python function to add two numbers"}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Option 3: Web UI with MLX

	Start an OpenAI-compatible server:

	```bash
	# Install mlx-lm if not already installed
	pip install mlx-lm

	# Start server with adapter
	mlx_lm.server \
	--model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
	--adapter-path salakash/AskBuddyX \
	--port 8080
	```

	Then use with any OpenAI-compatible client:

	```bash
	curl http://localhost:8080/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
	"messages": [
	{"role": "user", "content": "Write a Python function to reverse a string"}
	],
	"max_tokens": 512
	}'
	```

	Or use with any OpenAI-compatible web UI like:
	- [Open WebUI](https://github.com/open-webui/open-webui)
	- [LibreChat](https://github.com/danny-avila/LibreChat)
	- [ChatGPT-Next-Web](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web)

	Configure the UI to point to `http://localhost:8080` as the API endpoint.

	### Option 4: Hugging Face Inference API

	Use directly via Hugging Face's Inference API (requires HF token):

	```python
	import requests

	API_URL = "https://api-inference.huggingface.co/models/salakash/AskBuddyX"
	headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

	def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

	output = query({
	"inputs": "Write a Python function to check if a number is prime",
	"parameters": {"max_new_tokens": 256}
	})
	print(output)
	```

	## Response Format

	AskBuddyX provides structured, runnable-first responses:

	- Solution: The main implementation code
	- Usage: A minimal runnable example
	- Sanity test: A tiny test snippet (when appropriate)

	## Comparison
	AskBuddyX achieved the same objective in ~8-10 lines of code, while a standard LLM typically produced 22–26 lines for the equivalent solution.

	### AskBuddyX


	![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/vDk8E83n43c7FRti-QE5j.png)

	![alt text](image-1.png)

	### Standard Coding Agent


	![image](https://cdn-uploads.huggingface.co/production/uploads/6903f5738b82cf1035f9a011/5hl6r2GDSjkXh1e5dYlom.png)

	![alt text](image.png)

	## Base Model & Dataset

	- Base Model: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)
	- MLX Weights: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit)
	- Dataset: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k)
	- Dataset: [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback)

	## License

	This project publishes only adapter artifacts and configuration. The base model and dataset have their own licenses:

	- Base Model: Apache-2.0 (Qwen/Qwen2.5-Coder-0.5B-Instruct)
	- Dataset: Apache-2.0 (flwrlabs/code-alpaca-20k)

	See `LICENSE-THIRD-PARTY.md` for complete attribution.

	## Acknowledgments

	- Qwen team for the excellent base model.
	- MLX community for the Apple Silicon optimizations
	- flwrlabs for the code-alpaca-20k dataset
	- Multimodel Art Projection for m-a-p/Code-Feedback