Undi95
/

MM-ReMM-L2-20B

Text Generation

text-generation-inference

Model card Files Files and versions

MM-ReMM-L2-20B / README.md

Undi95's picture

Adding Evaluation Results (#1)

d1f8282 about 2 years ago

|

history blame contribute delete

1.46 kB

	---
	license: cc-by-nc-4.0
	---

	Merge:
	```shell
	layer_slices:
	- model: Gryphe/MythoMax-L2-13b
	start: 0
	end: 16
	- model: Undi95/MM-ReMM-L2-20B-Part1
	start: 8
	end: 20
	- model: Gryphe/MythoMax-L2-13b
	start: 17
	end: 32
	- model: Undi95/MM-ReMM-L2-20B-Part1
	start: 21
	end: 40
	```

	<!-- description start -->
	## Models used

	- Gryphe/MythoMax-L2-13b
	- Undi95/ReMM-v2.1-L2-13B
	<!-- description end -->

	Part1 = ReMM v2.1 merged /w MythoMax low weight to keep consistency. I call this "dilution" and result show consistency and coherency without repeat/loop beside the small amount of duplicated datas.

	## Prompt template: Alpaca

	```
	Below is an instruction that describes a task. Write a response that completes the request.

	### Instruction:
	{prompt}

	### Response:
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Undi95__MM-ReMM-L2-20B)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 51.14 \|
	\| ARC (25-shot) \| 60.84 \|
	\| HellaSwag (10-shot) \| 85.18 \|
	\| MMLU (5-shot) \| 56.45 \|
	\| TruthfulQA (0-shot) \| 53.33 \|
	\| Winogrande (5-shot) \| 75.77 \|
	\| GSM8K (5-shot) \| 7.73 \|
	\| DROP (3-shot) \| 18.66 \|