|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-7B-Instruct |
|
|
- Qwen/Qwen2.5-Coder-7B-Instruct |
|
|
- Qwen/Qwen2.5-Math-7B-Instruct |
|
|
tags: |
|
|
- model-soup |
|
|
- model-merging |
|
|
- qwen2.5 |
|
|
- souper-model |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Qwen2.5-7B-MathSoup |
|
|
|
|
|
🍲 **Model Soup** created using weighted averaging based on [Meta's Souper-Model](https://arxiv.org/abs/2511.13254). |
|
|
|
|
|
## Weights |
|
|
|
|
|
- **math**: 60% |
|
|
- **general**: 40% |
|
|
|
|
|
## Expected Performance (Linear Prediction) |
|
|
|
|
|
| Benchmark | Predicted Score | |
|
|
|-----------|----------------| |
|
|
| GSM8K | 88.3% | |
|
|
| HumanEval | 59.5% | |
|
|
|
|
|
*Note: Actual performance may differ due to weight interference effects.* |
|
|
|
|
|
## Component Models |
|
|
|
|
|
| Model | GSM8K | HumanEval | |
|
|
|-------|-------|----------| |
|
|
| Qwen2.5-7B-Instruct | 85.4% | 70.1% | |
|
|
| Qwen2.5-Coder-7B-Instruct | 60.4% | 88.4% | |
|
|
| Qwen2.5-Math-7B-Instruct | 90.3% | 52.4% | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("researchaudio/Qwen2.5-7B-MathSoup") |
|
|
tokenizer = AutoTokenizer.from_pretrained("researchaudio/Qwen2.5-7B-MathSoup") |
|
|
|
|
|
messages = [{"role": "user", "content": "Solve: What is 15% of 80?"}] |
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{soupermodel2025, |
|
|
title={Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance}, |
|
|
author={Shalini Maiti and others}, |
|
|
year={2025}, |
|
|
url={https://arxiv.org/abs/2511.13254}, |
|
|
} |
|
|
``` |
|
|
|