Instructions to use ragunath-ravi/Qwen2.5-7B-ChatCoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ragunath-ravi/Qwen2.5-7B-ChatCoder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ragunath-ravi/Qwen2.5-7B-ChatCoder")
model = AutoModelForCausalLM.from_pretrained("ragunath-ravi/Qwen2.5-7B-ChatCoder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ragunath-ravi/Qwen2.5-7B-ChatCoder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ragunath-ravi/Qwen2.5-7B-ChatCoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ragunath-ravi/Qwen2.5-7B-ChatCoder

SGLang

How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ragunath-ravi/Qwen2.5-7B-ChatCoder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ragunath-ravi/Qwen2.5-7B-ChatCoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ragunath-ravi/Qwen2.5-7B-ChatCoder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ragunath-ravi/Qwen2.5-7B-ChatCoder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ragunath-ravi/Qwen2.5-7B-ChatCoder with Docker Model Runner:
```
docker model run hf.co/ragunath-ravi/Qwen2.5-7B-ChatCoder
```

Qwen2.5-7B-ChatCoder

A merged model combining Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Instruct

Model Summary

Qwen2.5-7B-ChatCoder is a linearly merged language model that combines the instruction-following strength of Qwen2.5-7B-Instruct with the code generation capabilities of Qwen2.5-Coder-7B-Instruct.

The merge uses an 85% instruct / 15% coder weight split, carefully tuned to preserve the full chat and reasoning behaviour of the instruct model while absorbing coding knowledge from the coder model — resulting in a model that handles both natural conversation and code generation in a single set of weights.

Property	Value
Parameters	7.6B
Architecture	Qwen2ForCausalLM
Context length	128K tokens
Merge method	Linear
Instruct weight	0.85
Coder weight	0.15
dtype	bfloat16
Vocabulary size	152,064

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ragunath-ravi/Qwen2.5-7B-ChatCoder"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user",   "content": "Write a Python function to do binary search."},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        temperature=None,
        top_p=None,
        repetition_penalty=1.1,
        eos_token_id=[151645, 151643],
        pad_token_id=151645,
    )

response = tokenizer.decode(
    output[0][inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)
print(response)

Run with 4-bit Quantization (5GB VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model_id = "ragunath-ravi/Qwen2.5-7B-ChatCoder"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

Hardware Requirements

Precision	VRAM Required	Recommended GPU
bfloat16 (full)	~16 GB	RTX 3090 / A100 / H100
8-bit (bitsandbytes)	~8 GB	RTX 3080 / 4080
4-bit NF4 (bitsandbytes)	~5 GB	RTX 3060 / 4060

What This Model Is Good At

Instruction following — multi-turn chat, answering questions, reasoning
Code generation — Python, JavaScript, C++, Java, SQL, Bash
Code explanation — walk through what a piece of code does
Debugging — find and fix bugs with explanation
Mixed tasks — "explain this code and rewrite it to be more efficient"
Math reasoning — step-by-step problem solving

Merge Details

This model was created using mergekit with a linear merge strategy.

models:
  - model: Qwen/Qwen2.5-7B-Instruct
    parameters:
      weight: 0.85
  - model: Qwen/Qwen2.5-Coder-7B-Instruct
    parameters:
      weight: 0.15

merge_method: linear
dtype: bfloat16

Why linear merge? Linear merging computes a direct weighted average of all model weights with no pruning or masking. It is the most stable merge method for combining an instruct-tuned model with a base/specialist model, avoiding the weight corruption that DARE-TIES can introduce when density < 1.0.

Why 85/15 split? The Qwen2.5-Coder-7B at this path is a base model (not instruct-tuned). At higher coder weights (tested at 0.4), the base model behaviour dominates and breaks instruction following. At 0.15, the coding knowledge is absorbed while the instruct fine-tuning remains intact.

Limitations

Capabilities are bounded by the two parent models — this is not a trained model
Very long code generation (>500 lines) may degrade in quality
Not fine-tuned for agent/tool-use tasks
May occasionally produce confident but incorrect code — always test generated code

Citation

If you use this model, please cite the original Qwen2.5 models:

@misc{qwen2.5,
  title  = {Qwen2.5: A Party of Foundation Models},
  author = {Qwen Team},
  year   = {2024},
  url    = {https://qwenlm.github.io/blog/qwen2.5/}
}

Created By

Merged and released by ragunath-ravi.

Built with mergekit by Arcee AI.

Downloads last month: 12

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ragunath-ravi/Qwen2.5-7B-ChatCoder

Qwen/Qwen2.5-7B-Instruct

Qwen/Qwen2.5-Coder-7B-Instruct

Merge model

this model