Adding Evaluation Results

409c23e over 2 years ago

4.31 kB

	---
	license: llama2
	language:
	- en
	tags:
	- mistral
	- merge
	library_name: transformers
	pipeline_tag: text-generation
	mergekit:
	- Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
	- uukuguy/speechless-mistral-six-in-one-7b
	datasets:
	- stingning/ultrachat
	- garage-bAInd/Open-Platypus
	- Open-Orca/OpenOrca
	- TIGER-Lab/MathInstruct
	- OpenAssistant/oasst_top1_2023-08-25
	- teknium/openhermes
	- meta-math/MetaMathQA
	- Open-Orca/SlimOrca

	---

	<p align="center">
	<img src="https://codeberg.org/aninokuma/DeydooAssistant/raw/branch/main/logo.webp" height="256px" alt="SynthIQ">
	</p>

	# SynthIQ

	This is SynthIQ, rated 92.23/100 by GPT-4 across varied complex prompts. I used [mergekit](https://github.com/cg123/mergekit) to merge models.

	Metrics from OpenLLM leaderboard:

	\| Model \| Average \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \|
	\| ---------------------------------------- \| ------- \| ----- \| --------- \| ----- \| ---------- \| ---------- \| ------ \|
	\| Weyaxi/OpenHermes-2.5_neural-chat-v3-3-openchat-5-1210-Slerp \| 71.26 \| 67.92 \| 86.32 \| 65.47 \| 56.45 \| 79.72 \| 71.72 \|
	\| sethuiyer/SynthIO-7b \| 69.37 \| 65.87 \| 85.82 \| 64.75 \| 57 \| 78.69 \| 64.06 \|
	\| uukuguy/speechless-mistral-six-in-one-7b \| 60.76 \| 62.97 \| 84.6 \| 63.29 \| 57.77 \| 77.51 \| 18.42 \|


	# Yaml Config

	```yaml

	slices:
	- sources:
	- model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
	layer_range: [0, 32]
	- model: uukuguy/speechless-mistral-six-in-one-7b
	layer_range: [0, 32]

	merge_method: slerp
	base_model: mistralai/Mistral-7B-v0.1

	parameters:
	t:
	- filter: self_attn
	value: [0, 0.5, 0.3, 0.7, 1]
	- filter: mlp
	value: [1, 0.5, 0.7, 0.3, 0]
	- value: 0.5 # fallback for rest of tensors
	tokenizer_source: union

	dtype: bfloat16

	```

	<!-- prompt-template start -->
	## Prompt template: ChatML

	```
	<\|im_start\|>system
	{system_message}<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant

	```

	<!-- prompt-template end -->

	SynthIQ's strengths can be succinctly summarized as follows:

	1. Advanced Natural Language Processing: SynthIQ excels in understanding and generating natural language, making it highly effective for conversational AI applications.

	2. Strong Commonsense Reasoning: It demonstrates a solid grasp of everyday scenarios and contexts, essential for practical and real-world applications.

	3. Creative and Engaging Content Generation: SynthIQ has the capability to produce creative content, useful in fields like marketing, creative writing, and social media engagement.

	4. Adaptive User Interaction: It can effectively adapt to various user personas, providing personalized experiences and recommendations.

	5. Multitasking Across Languages and Subjects: SynthIQ is adept at handling tasks across different languages and subjects, showcasing its versatility in global and multifaceted settings.

	6. Analytical and Problem-Solving Skills: The model shows proficiency in analytical reasoning and problem-solving, applicable in data-driven decision-making and complex scenario analysis.

	7. Cultural and Contextual Awareness: SynthIQ's awareness of different cultural and social contexts makes it suitable for applications requiring cultural sensitivity.

	8. Empathetic and Human-Like Interactions: The model can engage in empathetic and human-like dialogues, ideal for applications in mental health support, customer service, and education.


	License is LLama2 license as uukuguy/speechless-mistral-six-in-one-7b is llama2 license.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sethuiyer__SynthIQ-7b)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 69.37 \|
	\| ARC (25-shot) \| 65.87 \|
	\| HellaSwag (10-shot) \| 85.82 \|
	\| MMLU (5-shot) \| 64.75 \|
	\| TruthfulQA (0-shot) \| 57.0 \|
	\| Winogrande (5-shot) \| 78.69 \|
	\| GSM8K (5-shot) \| 64.06 \|