mlabonne
/

AlphaMonarch-7B

@@ -3,6 +3,8 @@ license: cc-by-nc-4.0
 tags:
 - merge
 - lazymergekit
 dataset:
 - mlabonne/truthy-dpo-v0.1
 - mlabonne/distilabel-intel-orca-dpo-pairs
@@ -17,7 +19,7 @@ language:
 # 👑 AlphaMonarch-7B
-**Update 14/02/24: AlphaMonarch-7B is the new best-performing 7B model on Nous' benchmark suite! 🎉**
 AlphaMonarch-7B is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset.
@@ -30,9 +32,9 @@ Special thanks to [Jon Durbin](https://huggingface.co/jondurbin), [Intel](https:
 ## 🔍 Applications
-This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template.
-Compared to other 7B models, it displays good performance in instruction following and reasoning tasks. It can also be used for RP and storytelling.
 ## ⚡ Quantized models
@@ -52,14 +54,15 @@ The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/ll
 | [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [📄](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
 | [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) [📄](https://gist.github.com/mlabonne/14687f1eb3425b166db511f31f8e66f6) | 53.51 | 43.67 | 73.24 | 55.37 | 41.76 |
 | [mlabonne/NeuralBeagle14-7B](https://huggingface.co/mlabonne/NeuralBeagle14-7B) [📄](https://gist.github.com/mlabonne/ad0c665bbe581c8420136c3b52b3c15c) | 60.25 | 46.06 | 76.77 | 70.32 | 47.86 |
 | [eren23/dpo-binarized-NeuralTrix-7B](https://huggingface.co/eren23/dpo-binarized-NeuralTrix-7B) [📄](https://gist.github.com/CultriX-Github/dbdde67ead233df0c7c56f1b091f728c) | 62.5 | 44.57 | 76.34 | 79.81 | 49.27 |
 | [CultriX/NeuralTrix-7B-dpo](https://huggingface.co/CultriX/NeuralTrix-7B-dpo) [📄](https://gist.github.com/CultriX-Github/df0502599867d4043b45d9dafb5976e8) | 62.5 | 44.61 | 76.33 | 79.8 | 49.24 |
-### Open LLM Leaderboard
-AlphaMonarch-7B is one of the best-performing non-merge 7B models on the Open LLM Leaderboard:
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/njHxX_ERQaBssHqp17fMy.png)
 ### MT-Bench
@@ -68,11 +71,13 @@ AlphaMonarch-7B is one of the best-performing non-merge 7B models on the Open LL
                                     score
 model                       turn
 gpt-4                       1     8.95625
 AlphaMonarch-7B             1     8.23750
 claude-v1                   1     8.15000
 gpt-3.5-turbo               1     8.07500
 claude-instant-v1           1     7.80000
 ########## Second turn ##########
                                      score
 model                       turn
@@ -81,17 +86,26 @@ claude-instant-v1           2     8.012658
 gpt-3.5-turbo               2     7.812500
 claude-v1                   2     7.650000
 AlphaMonarch-7B             2     7.618750
 ########## Average ##########
                                 score
 model
 gpt-4                        8.990625
 gpt-3.5-turbo                7.943750
 AlphaMonarch-7B              7.928125
 claude-instant-v1            7.905660
 claude-v1                    7.900000
 ```
 ## 💻 Usage
 ```python
@@ -101,7 +115,7 @@ from transformers import AutoTokenizer
 import transformers
 import torch
-model = "mlabonne/MonarchMonarch-7B"
 messages = [{"role": "user", "content": "What is a large language model?"}]
 tokenizer = AutoTokenizer.from_pretrained(model)

 tags:
 - merge
 - lazymergekit
+- dpo
+- rlhf
 dataset:
 - mlabonne/truthy-dpo-v0.1
 - mlabonne/distilabel-intel-orca-dpo-pairs
 # 👑 AlphaMonarch-7B
+**tl;dr: AlphaMonarch-7B is a new DPO merge that retains all the reasoning abilities of the very best merges and significantly improves its conversational abilities. Kind of the best of both worlds in a 7B model. 🎉**
 AlphaMonarch-7B is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset.
 ## 🔍 Applications
+This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
+It is one of the very best 7B models in terms of instructing following and reasoning abilities and can be used for conversations, RP, and storytelling. Note that it tends to have a quite formal and sophisticated style, but it can be changed by modifying the prompt.
 ## ⚡ Quantized models
 | [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [📄](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
 | [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) [📄](https://gist.github.com/mlabonne/14687f1eb3425b166db511f31f8e66f6) | 53.51 | 43.67 | 73.24 | 55.37 | 41.76 |
 | [mlabonne/NeuralBeagle14-7B](https://huggingface.co/mlabonne/NeuralBeagle14-7B) [📄](https://gist.github.com/mlabonne/ad0c665bbe581c8420136c3b52b3c15c) | 60.25 | 46.06 | 76.77 | 70.32 | 47.86 |
+| [mlabonne/NeuralOmniBeagle-7B](https://huggingface.co/mlabonne/NeuralOmniBeagle-7B) [📄](https://gist.github.com/mlabonne/0e49d591787185fa5ae92ca5d9d4a1fd) | 62.3 | 45.85 | 77.26 | 76.06 | 50.03 |
 | [eren23/dpo-binarized-NeuralTrix-7B](https://huggingface.co/eren23/dpo-binarized-NeuralTrix-7B) [📄](https://gist.github.com/CultriX-Github/dbdde67ead233df0c7c56f1b091f728c) | 62.5 | 44.57 | 76.34 | 79.81 | 49.27 |
 | [CultriX/NeuralTrix-7B-dpo](https://huggingface.co/CultriX/NeuralTrix-7B-dpo) [📄](https://gist.github.com/CultriX-Github/df0502599867d4043b45d9dafb5976e8) | 62.5 | 44.61 | 76.33 | 79.8 | 49.24 |
+### EQ-bench
+AlphaMonarch-7B is the second best-performing 7B model on [EQ-bench](https://eqbench.com/) by Samuel J. Peach.
 ### MT-Bench
                                     score
 model                       turn
 gpt-4                       1     8.95625
+OmniBeagle-7B               1     8.32500
 AlphaMonarch-7B             1     8.23750
 claude-v1                   1     8.15000
 gpt-3.5-turbo               1     8.07500
 claude-instant-v1           1     7.80000
 ########## Second turn ##########
                                      score
 model                       turn
 gpt-3.5-turbo               2     7.812500
 claude-v1                   2     7.650000
 AlphaMonarch-7B             2     7.618750
+OmniBeagle-7B               2     7.587500
 ########## Average ##########
                                 score
 model
 gpt-4                        8.990625
+OmniBeagle-7B                7.956250
 gpt-3.5-turbo                7.943750
 AlphaMonarch-7B              7.928125
 claude-instant-v1            7.905660
 claude-v1                    7.900000
+NeuralBeagle14-7B            7.628125
 ```
+### Open LLM Leaderboard
+AlphaMonarch-7B is one of the best-performing non-merge 7B models on the Open LLM Leaderboard:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/njHxX_ERQaBssHqp17fMy.png)
 ## 💻 Usage
 ```python
 import transformers
 import torch
+model = "mlabonne/AlphaMonarch-7B"
 messages = [{"role": "user", "content": "What is a large language model?"}]
 tokenizer = AutoTokenizer.from_pretrained(model)