Instructions to use masato25/Qwen2.5-Coder-3B-Arbor-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use masato25/Qwen2.5-Coder-3B-Arbor-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("masato25/Qwen2.5-Coder-3B-Arbor-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use masato25/Qwen2.5-Coder-3B-Arbor-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "masato25/Qwen2.5-Coder-3B-Arbor-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "masato25/Qwen2.5-Coder-3B-Arbor-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use masato25/Qwen2.5-Coder-3B-Arbor-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "masato25/Qwen2.5-Coder-3B-Arbor-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default masato25/Qwen2.5-Coder-3B-Arbor-4bit

Run Hermes

hermes

OpenClaw new

How to use masato25/Qwen2.5-Coder-3B-Arbor-4bit with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "masato25/Qwen2.5-Coder-3B-Arbor-4bit"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "masato25/Qwen2.5-Coder-3B-Arbor-4bit" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

MLX LM

How to use masato25/Qwen2.5-Coder-3B-Arbor-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "masato25/Qwen2.5-Coder-3B-Arbor-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "masato25/Qwen2.5-Coder-3B-Arbor-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "masato25/Qwen2.5-Coder-3B-Arbor-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Qwen2.5-Coder-3B-Arbor-4bit

masato25/Qwen2.5-Coder-3B-Arbor-4bit is an Apple MLX 4-bit quantized derivative of Qwen/Qwen2.5-Coder-3B.

This conversion is intended for local inference on Apple Silicon using MLX / MLX-LM. It preserves the upstream model architecture and tokenizer files, while quantizing supported linear weights to 4-bit for a smaller memory footprint.

Attribution and license

Base model: Qwen/Qwen2.5-Coder-3B
Upstream license: Qwen Research License Agreement (qwen-research)
Upstream license file: https://huggingface.co/Qwen/Qwen2.5-Coder-3B/blob/main/LICENSE
This repository is a derivative conversion/quantization and is not the original Qwen release.

A NOTICE file is included in this repository. By using, copying, modifying, redistributing, deploying, or making this derivative model available to others, you are responsible for complying with all applicable upstream terms, including the Qwen Research License Agreement and any additional terms, notices, access requirements, usage instructions, export-control, sanctions, or other legal requirements that apply to the upstream model.

If the upstream license, notices, or model-page terms are updated, those upstream terms may impose additional or different obligations. Please review the upstream model page and license before use or redistribution.

No affiliation, sponsorship, endorsement, or trademark grant

This repository is independently prepared and published by the repository owner. It is not affiliated with, sponsored by, approved by, or endorsed by Alibaba Cloud, Qwen, or their affiliates unless they explicitly state otherwise.

The names "Qwen", "Alibaba Cloud", and related marks are used here only for reasonable descriptive attribution and identification of the upstream base model. No trademark license or other rights in those marks are granted by this repository.

Conversion details

Source: Qwen/Qwen2.5-Coder-3B
Format: MLX / MLX-LM
Quantization: 4-bit affine weight quantization
Group size: 64
Upstream revision: 09d9bc5d376b0cfa0100a0694ea7de7232525803
Conversion command:

python -m mlx_lm.convert \
  --hf-path Qwen/Qwen2.5-Coder-3B \
  --mlx-path Qwen2.5-Coder-3B-Arbor-4bit \
  -q \
  --q-bits 4 \
  --q-group-size 64

Usage

Install MLX-LM:

pip install -U mlx-lm

Example generation:

python -m mlx_lm.generate \
  --model masato25/Qwen2.5-Coder-3B-Arbor-4bit \
  --prompt "Write a Python function that checks whether a string is a palindrome." \
  --max-tokens 256 \
  --temp 0.0

Note: the upstream model is a base/pretrained code language model, not an instruction/chat-tuned model. The upstream model card does not recommend using base language models directly for conversations.

Intended use

This quantized derivative is intended for experimentation, research/evaluation, prototyping, and Apple Silicon local inference where permitted by the upstream Qwen Research License Agreement.

You should evaluate whether your intended use is permitted under the upstream license and related terms. This repository does not expand, waive, or modify any upstream restrictions.

Limitations and safety

Quantization can change output quality, numerical behavior, robustness, and safety characteristics compared with the original model. This repository does not claim improved accuracy, safety, bias mitigation, alignment, or suitability for any particular purpose over the upstream Qwen model.

Model outputs may be inaccurate, unsafe, biased, offensive, incomplete, vulnerable, or otherwise unsuitable for your use case. Do not rely on the model as the sole source of truth for code correctness, security, medical, legal, financial, safety-critical, or other high-stakes decisions. Review and test generated code before use.

Disclaimer

This derivative model is provided as-is and without warranties or conditions of any kind, express or implied, including without limitation warranties of merchantability, fitness for a particular purpose, title, non-infringement, accuracy, availability, or error-free operation.

To the maximum extent permitted by applicable law, the repository owner is not liable for any direct, indirect, incidental, special, consequential, exemplary, punitive, or other damages arising from or related to use of this repository, the derivative model, generated code, or other model outputs.

Nothing in this README is legal advice. You are responsible for reviewing and complying with the applicable license terms and laws.

Downloads last month: 106

Safetensors

Model size

0.5B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for masato25/Qwen2.5-Coder-3B-Arbor-4bit

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Quantized

(35)

this model