|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- mistralai/Mistral-7B-v0.1 |
|
|
- meta-llama/Llama-2-7b-hf |
|
|
library_name: transformers |
|
|
tags: |
|
|
- mergekit |
|
|
- merged-model |
|
|
- mistral |
|
|
- llama2 |
|
|
- language-model |
|
|
--- |
|
|
|
|
|
# 𧬠Mistral-LLaMA-Fusion: A Hybrid of Open Weight Titans |
|
|
|
|
|
## π Overview |
|
|
**Mistral-LLaMA-Fusion** is an **experimental merged language model** combining the strengths of **Mistral-7B-v0.1** and **LLaMA-2-7B** using the **Linear Merge** method via [MergeKit](https://github.com/cg123/mergekit). This hybrid model aims to balance Mistralβs efficiency and architecture with LLaMA-2βs robustness in reasoning and instruction following. |
|
|
|
|
|
π **Created by**: [Matteo Khan] |
|
|
π **Affiliation**: Apprentice at TW3 Partners (Generative AI Research) |
|
|
π **License**: MIT |
|
|
|
|
|
π [Connect on LinkedIn](https://www.linkedin.com/in/matteo-khan-a10309263/) |
|
|
π [Model on Hugging Face](https://huggingface.co/MatteoKhan/Mistral-LLaMA-Fusion) |
|
|
|
|
|
## π§ Model Details |
|
|
- **Model Type**: Merged Language Model |
|
|
- **Parent Models**: |
|
|
- [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) |
|
|
- [LLaMA-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
|
|
- **Merging Method**: Linear Merge (via MergeKit) |
|
|
|
|
|
## π― Intended Use |
|
|
This model is suited for research in model merging and hybridization, and can be used for: |
|
|
- β
Text Generation |
|
|
- β
Instruction Following |
|
|
- β
Creative Writing |
|
|
- β
Prompt Engineering Experiments |
|
|
|
|
|
## β οΈ Limitations |
|
|
As with all merged models, this fusion may inherit and combine weaknesses from both parents: |
|
|
- β Possible generation of false, biased, or inappropriate content |
|
|
- β οΈ Unpredictable behavior in edge cases |
|
|
- π No guaranteed performance gain across all benchmarks |
|
|
|
|
|
## π¬ Merging Configuration |
|
|
|
|
|
```yaml |
|
|
merge_method: linear |
|
|
dtype: float16 |
|
|
models: |
|
|
- model: mistralai/Mistral-7B-v0.1 |
|
|
parameters: |
|
|
t: 1.0 |
|
|
weight: 0.6 |
|
|
- model: meta-llama/Llama-2-7b-hf |
|
|
parameters: |
|
|
t: 1.0 |
|
|
weight: 0.4 |
|
|
|
|
|
parameters: |
|
|
normalize: true |
|
|
int8_mask: false |
|
|
|
|
|
layers: |
|
|
- pattern: "model.*" |
|
|
π Note: No additional fine-tuning was performed. This is a straight merge using MergeKit. |
|
|
|
|
|
π± Why Merging? |
|
|
Merging allows rapid experimentation with existing checkpoints while reducing the computational cost and carbon footprint compared to training from scratch. |
|
|
|
|
|
π How to Use |
|
|
python |
|
|
Copier |
|
|
Modifier |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "MatteoKhan/Mistral-LLaMA-Fusion" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto") |
|
|
|
|
|
prompt = "Explain the benefits of merging language models." |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=200) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |