Instructions to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist", filename="Qwen2.5-Coder-32B-Python-Specialist-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M # Run inference directly in the terminal: llama cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M # Run inference directly in the terminal: llama cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Use Docker
docker model run hf.co/TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with Ollama:
ollama run hf.co/TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
- Unsloth Studio
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist to start chatting
- Pi
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with Docker Model Runner:
docker model run hf.co/TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
- Lemonade
How to use TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Run and chat with the model
lemonade run user.Qwen2.5-Coder-32B-Python-Specialist-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)Qwen2.5-Coder-32B-Python-Specialist
Model Description
Qwen2.5-Coder-32B-Python-Specialist is an instruction-tuned version of the standard Qwen2.5-Coder-32B base model. This model has been specifically fine-tuned on a high-quality blend of Python and generalized coding instruction datasets to enhance its proficiency in formatting compliance, multi-turn coding problem solving, and Python-specific tasks.
Note: The model retains the original safety filters and alignment of the Qwen2.5 base model.
The model was fine-tuned using a distilled, high-quality combination of the CodeFeedback-Filtered-Instruction and python_code_instructions_18k_alpaca datasets, running over 20,000 highly diverse programming scenarios.
By aggressively targeting the Attention layers during fine-tuning (while leaving the complex MLP structures frozen), this model achieves state-of-the-art formatting compliance and instruction following without compromising the encyclopedic coding knowledge of the 32B base model.
Model Details
- Base Model: unsloth/Qwen2.5-Coder-32B-Instruct-bnb-4bit
- Parameters: 32 Billion
- Context Length: Up to 32K (optimized at 512 for dense instruction tuning)
- Training Strategy: LoRA (Attention Modules Only:
q_proj,k_proj,v_proj,o_proj) - Dataset: 20,000 samples (CodeFeedback + Python Alpaca)
- Quantization: Available in 16-bit safetensors and 4-bit GGUF (
q4_k_m)
Usage
Ollama / LM Studio (GGUF)
You can seamlessly run the GGUF version locally using Ollama:
ollama run hf.co/TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist:Q4_K_M
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist")
model = AutoModelForCausalLM.from_pretrained(
"TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist",
device_map="auto"
)
messages = [
{"role": "user", "content": "Write a python script to parse a CSV file."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
Training Data & Methodology
The model was fine-tuned utilizing Unsloth for rapid multi-processing data ingestion and memory-efficient LoRA scaling. The dataset consisted of heavily curated coding problems, heavily indexing on Python, converted into standard ShareGPT conversational format.
To enhance instruction following without catastrophic forgetting, we targeted only the Attention matrices. The model was trained with a learning rate of 2e-4, achieving a remarkably low final loss of 0.45 without overfitting.
- Downloads last month
- 641
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="TobiasLogic/Qwen2.5-Coder-32B-Python-Specialist", filename="Qwen2.5-Coder-32B-Python-Specialist-Q4_K_M.gguf", )