Instructions to use alex2110/qwen2.5-0.5b-code-bcp-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="alex2110/qwen2.5-0.5b-code-bcp-v2",
	filename="qwen2.5-0.5b-instruct.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Use Docker

docker model run hf.co/alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

LM Studio
Jan

vLLM

How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "alex2110/qwen2.5-0.5b-code-bcp-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alex2110/qwen2.5-0.5b-code-bcp-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Ollama
How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with Ollama:
```
ollama run hf.co/alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M
```

Unsloth Studio

How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for alex2110/qwen2.5-0.5b-code-bcp-v2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for alex2110/qwen2.5-0.5b-code-bcp-v2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for alex2110/qwen2.5-0.5b-code-bcp-v2 to start chatting

How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with Docker Model Runner:
```
docker model run hf.co/alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M
```

Lemonade

How to use alex2110/qwen2.5-0.5b-code-bcp-v2 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull alex2110/qwen2.5-0.5b-code-bcp-v2:Q4_K_M

Run and chat with the model

lemonade run user.qwen2.5-0.5b-code-bcp-v2-Q4_K_M

List all available models

lemonade list

🚀 Qwen2.5-0.5B-Code-BCP-V2

📝 Overview

This model is a fine-tuned version of Qwen2.5-0.5B-Instruct, specialized for real-time code refactoring, logging injection, and algorithmic optimization. It is designed to power VSCode extensions where low latency and local execution are critical.

Compared to the base model, BCP-V2 demonstrates an emergent understanding of time complexity (O(n) awareness) and strictly follows developer-centric instructions without unnecessary conversational filler ("Zero-Yapping").

Key Capabilities:

Optimization: Identifying and refactoring nested loops into Hash Map lookups.
Structured Logging: Injecting custom-formatted logs (e.g., [MONITOR] templates).
Logic Transformation: Converting recursive functions to iterative patterns.
IDE Ready: Optimized for GGUF format for seamless integration with Ollama or llama.cpp.

📊 Training Details

Base Model: Qwen2.5-0.5B-Instruct (4-bit quantized)
Framework: Unsloth
Dataset: iamtarun/python_code_instructions_18k_alpaca
Method: LoRA (Low-Rank Adaptation)
Steps: 600 steps (~4,800 examples processed)
Batch Size: 8 (2 per device × 4 accumulation steps)
Scheduler: Cosine learning rate decay
Optimizer: AdamW 8-bit

📈 Evaluation: V1 vs. V2 Comparison

During development, we analyzed the impact of training duration on algorithmic reasoning.

Feature	Base Model (0.5B)	BCP-V1 (150 steps)	BCP-V2 (600 steps)
Response Speed	Instant	Instant	Instant
Instruction Adherence	Medium	High	Strict
Algorithmic Reasoning	Low	Low	High (O(n) intent)
Explanations (Yapping)	High	Low	Minimal (Zero-Yapping)

Notable Improvements in V2:

Test Case (Hash Map): While V1 failed to optimize nested loops, V2 correctly identified the need for a lookup dictionary to improve performance from $O(n^2)$ to $O(n)$.
Test Case (Logging): V2 handles complex string interpolation (e.g., using **locals()) while maintaining strict template formatting.

💻 Usage

Prompt Format:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{code_snippet}

### Response:
{code_snipped_refactored}

Running with Ollama:

Download the .gguf file from this repository.
Create a Modelfile: FROM ./qwen2.5-0.5b-instruct.Q4_K_M.gguf TEMPLATE "{{ .Prompt }}"
Run: ollama create bcp-v2 -f Modelfile

⚠️ Limitations

As a 0.5B parameter model, BCP-V2 is highly efficient but may occasionally produce minor syntax errors in very complex logic. It is best used for refactoring snippets of up to 50 lines and as a high-speed coding assistant.

🤝 Collaboration

This model was developed as part of a project to create an intelligent local-first VSCode extension chatbot.

Lead Fine-tuning Engineer: Alex (alex2110)

Downloads last month: 13

GGUF

Model size

0.5B params

Architecture

qwen2

Hardware compatibility

4-bit

Model tree for alex2110/qwen2.5-0.5b-code-bcp-v2

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

unsloth/Qwen2.5-0.5B-Instruct

Quantized

(21)

this model