Qwen2.5-7B-ChatCoder

A merged model combining Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Instruct

Model License Merge Method


Model Summary

Qwen2.5-7B-ChatCoder is a linearly merged language model that combines the instruction-following strength of Qwen2.5-7B-Instruct with the code generation capabilities of Qwen2.5-Coder-7B-Instruct.

The merge uses an 85% instruct / 15% coder weight split, carefully tuned to preserve the full chat and reasoning behaviour of the instruct model while absorbing coding knowledge from the coder model โ€” resulting in a model that handles both natural conversation and code generation in a single set of weights.

Property Value
Parameters 7.6B
Architecture Qwen2ForCausalLM
Context length 128K tokens
Merge method Linear
Instruct weight 0.85
Coder weight 0.15
dtype bfloat16
Vocabulary size 152,064

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ragunath-ravi/Qwen2.5-7B-ChatCoder"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user",   "content": "Write a Python function to do binary search."},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        temperature=None,
        top_p=None,
        repetition_penalty=1.1,
        eos_token_id=[151645, 151643],
        pad_token_id=151645,
    )

response = tokenizer.decode(
    output[0][inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)
print(response)

Run with 4-bit Quantization (5GB VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model_id = "ragunath-ravi/Qwen2.5-7B-ChatCoder"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

Hardware Requirements

Precision VRAM Required Recommended GPU
bfloat16 (full) ~16 GB RTX 3090 / A100 / H100
8-bit (bitsandbytes) ~8 GB RTX 3080 / 4080
4-bit NF4 (bitsandbytes) ~5 GB RTX 3060 / 4060

What This Model Is Good At

  • Instruction following โ€” multi-turn chat, answering questions, reasoning
  • Code generation โ€” Python, JavaScript, C++, Java, SQL, Bash
  • Code explanation โ€” walk through what a piece of code does
  • Debugging โ€” find and fix bugs with explanation
  • Mixed tasks โ€” "explain this code and rewrite it to be more efficient"
  • Math reasoning โ€” step-by-step problem solving

Merge Details

This model was created using mergekit with a linear merge strategy.

models:
  - model: Qwen/Qwen2.5-7B-Instruct
    parameters:
      weight: 0.85
  - model: Qwen/Qwen2.5-Coder-7B-Instruct
    parameters:
      weight: 0.15

merge_method: linear
dtype: bfloat16

Why linear merge? Linear merging computes a direct weighted average of all model weights with no pruning or masking. It is the most stable merge method for combining an instruct-tuned model with a base/specialist model, avoiding the weight corruption that DARE-TIES can introduce when density < 1.0.

Why 85/15 split? The Qwen2.5-Coder-7B at this path is a base model (not instruct-tuned). At higher coder weights (tested at 0.4), the base model behaviour dominates and breaks instruction following. At 0.15, the coding knowledge is absorbed while the instruct fine-tuning remains intact.


Limitations

  • Capabilities are bounded by the two parent models โ€” this is not a trained model
  • Very long code generation (>500 lines) may degrade in quality
  • Not fine-tuned for agent/tool-use tasks
  • May occasionally produce confident but incorrect code โ€” always test generated code

Citation

If you use this model, please cite the original Qwen2.5 models:

@misc{qwen2.5,
  title  = {Qwen2.5: A Party of Foundation Models},
  author = {Qwen Team},
  year   = {2024},
  url    = {https://qwenlm.github.io/blog/qwen2.5/}
}

Created By

Merged and released by ragunath-ravi.

Built with mergekit by Arcee AI.

Downloads last month
26
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ragunath-ravi/Qwen2.5-7B-ChatCoder