The Moe model was constructed using microsoft/phi-2 as the base, with experts from microsoft/phi-2, g-ronimo/phi-2-OpenHermes-2.5, and mlx-community/phi-2-dpo-7k. Then qlora was applied to all layers of q,v, and gate linear on WizardLM_evol_instruct_70k via mlx. The model was created using a script from https://github.com/mzbac/mlx-moe

Evaluation

hellaswag

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
hellaswag	1	none	0	acc	0.5482	±	0.0050
		none	0	acc_norm	0.7300	±	0.0044

MMLU

Groups	Version	Filter	Metric	Value		Stderr
- humanities	N/A	none	acc	0.5817	±	0.0247
- other	N/A	none	acc	0.5795	±	0.0311
- social_sciences	N/A	none	acc	0.6347	±	0.0292
- stem	N/A	none	acc	0.4486	±	0.0376

BBH

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
bbh_cot_fewshot_snarks	2	get-answer	3	exact_match	0.5281	±	0.0375

GSM8k

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
gsm8k	2	get-answer	5	exact_match	0.5224	±	0.0138

Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mzbac/phi-2-2x3-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

text = "Instruct: how backpropagation works.\nOutput:"
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 3

Safetensors

Model size

6B params

Tensor type

BF16