How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Use Docker
docker model run hf.co/TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Quick Links

Qwen2.5-Coder-32B-Python-Specialist

Model Description

Qwen2.5-Coder-32B-Python-Specialist is an instruction-tuned version of the standard Qwen2.5-Coder-32B base model. This model has been specifically fine-tuned on a high-quality blend of Python and generalized coding instruction datasets to enhance its proficiency in formatting compliance, multi-turn coding problem solving, and Python-specific tasks.

Note: The model retains the original safety filters and alignment of the Qwen2.5 base model.

The model was fine-tuned using a distilled, high-quality combination of the CodeFeedback-Filtered-Instruction and python_code_instructions_18k_alpaca datasets, running over 20,000 highly diverse programming scenarios.

By aggressively targeting the Attention layers during fine-tuning (while leaving the complex MLP structures frozen), this model achieves state-of-the-art formatting compliance and instruction following without compromising the encyclopedic coding knowledge of the 32B base model.

Model Details

  • Base Model: unsloth/Qwen2.5-Coder-32B-Instruct-bnb-4bit
  • Parameters: 32 Billion
  • Context Length: Up to 32K (optimized at 512 for dense instruction tuning)
  • Training Strategy: LoRA (Attention Modules Only: q_proj, k_proj, v_proj, o_proj)
  • Dataset: 20,000 samples (CodeFeedback + Python Alpaca)
  • Quantization: Available in 16-bit safetensors and 4-bit GGUF (q4_k_m)

Usage

Ollama / LM Studio (GGUF)

You can seamlessly run the GGUF version locally using Ollama:

ollama run hf.co/TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist")
model = AutoModelForCausalLM.from_pretrained(
    "TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Write a python script to parse a CSV file."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

Training Data & Methodology

The model was fine-tuned utilizing Unsloth for rapid multi-processing data ingestion and memory-efficient LoRA scaling. The dataset consisted of heavily curated coding problems, heavily indexing on Python, converted into standard ShareGPT conversational format.

To enhance instruction following without catastrophic forgetting, we targeted only the Attention matrices. The model was trained with a learning rate of 2e-4, achieving a remarkably low final loss of 0.45 without overfitting.

Downloads last month
923
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist

Quantizations
2 models