Instructions to use ragunath-ravi/Qwen2.5-7B-ChatCoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ragunath-ravi/Qwen2.5-7B-ChatCoder") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ragunath-ravi/Qwen2.5-7B-ChatCoder") model = AutoModelForCausalLM.from_pretrained("ragunath-ravi/Qwen2.5-7B-ChatCoder") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ragunath-ravi/Qwen2.5-7B-ChatCoder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ragunath-ravi/Qwen2.5-7B-ChatCoder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ragunath-ravi/Qwen2.5-7B-ChatCoder
- SGLang
How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ragunath-ravi/Qwen2.5-7B-ChatCoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ragunath-ravi/Qwen2.5-7B-ChatCoder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ragunath-ravi/Qwen2.5-7B-ChatCoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ragunath-ravi/Qwen2.5-7B-ChatCoder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with Docker Model Runner:
docker model run hf.co/ragunath-ravi/Qwen2.5-7B-ChatCoder
Model Summary
Qwen2.5-7B-ChatCoder is a linearly merged language model that combines the instruction-following strength of Qwen2.5-7B-Instruct with the code generation capabilities of Qwen2.5-Coder-7B-Instruct.
The merge uses an 85% instruct / 15% coder weight split, carefully tuned to preserve the full chat and reasoning behaviour of the instruct model while absorbing coding knowledge from the coder model โ resulting in a model that handles both natural conversation and code generation in a single set of weights.
| Property | Value |
|---|---|
| Parameters | 7.6B |
| Architecture | Qwen2ForCausalLM |
| Context length | 128K tokens |
| Merge method | Linear |
| Instruct weight | 0.85 |
| Coder weight | 0.15 |
| dtype | bfloat16 |
| Vocabulary size | 152,064 |
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ragunath-ravi/Qwen2.5-7B-ChatCoder"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
messages = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to do binary search."},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
temperature=None,
top_p=None,
repetition_penalty=1.1,
eos_token_id=[151645, 151643],
pad_token_id=151645,
)
response = tokenizer.decode(
output[0][inputs.input_ids.shape[1]:],
skip_special_tokens=True
)
print(response)
Run with 4-bit Quantization (5GB VRAM)
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model_id = "ragunath-ravi/Qwen2.5-7B-ChatCoder"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
)
Hardware Requirements
| Precision | VRAM Required | Recommended GPU |
|---|---|---|
| bfloat16 (full) | ~16 GB | RTX 3090 / A100 / H100 |
| 8-bit (bitsandbytes) | ~8 GB | RTX 3080 / 4080 |
| 4-bit NF4 (bitsandbytes) | ~5 GB | RTX 3060 / 4060 |
What This Model Is Good At
- Instruction following โ multi-turn chat, answering questions, reasoning
- Code generation โ Python, JavaScript, C++, Java, SQL, Bash
- Code explanation โ walk through what a piece of code does
- Debugging โ find and fix bugs with explanation
- Mixed tasks โ "explain this code and rewrite it to be more efficient"
- Math reasoning โ step-by-step problem solving
Merge Details
This model was created using mergekit with a linear merge strategy.
models:
- model: Qwen/Qwen2.5-7B-Instruct
parameters:
weight: 0.85
- model: Qwen/Qwen2.5-Coder-7B-Instruct
parameters:
weight: 0.15
merge_method: linear
dtype: bfloat16
Why linear merge? Linear merging computes a direct weighted average of all model weights with no pruning or masking. It is the most stable merge method for combining an instruct-tuned model with a base/specialist model, avoiding the weight corruption that DARE-TIES can introduce when density < 1.0.
Why 85/15 split? The Qwen2.5-Coder-7B at this path is a base model (not instruct-tuned). At higher coder weights (tested at 0.4), the base model behaviour dominates and breaks instruction following. At 0.15, the coding knowledge is absorbed while the instruct fine-tuning remains intact.
Limitations
- Capabilities are bounded by the two parent models โ this is not a trained model
- Very long code generation (>500 lines) may degrade in quality
- Not fine-tuned for agent/tool-use tasks
- May occasionally produce confident but incorrect code โ always test generated code
Citation
If you use this model, please cite the original Qwen2.5 models:
@misc{qwen2.5,
title = {Qwen2.5: A Party of Foundation Models},
author = {Qwen Team},
year = {2024},
url = {https://qwenlm.github.io/blog/qwen2.5/}
}
Created By
Merged and released by ragunath-ravi.
Built with mergekit by Arcee AI.
- Downloads last month
- 12