--- license: mit model-index: - name: pygmalion-instruct results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 52.56 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AlpinDale/pygmalion-instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 77.65 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AlpinDale/pygmalion-instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 35.94 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AlpinDale/pygmalion-instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 42.13 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AlpinDale/pygmalion-instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 72.06 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AlpinDale/pygmalion-instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 9.86 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AlpinDale/pygmalion-instruct name: Open LLM Leaderboard --- ## Model Details Experimental model. Trained with the [Pygmalion](https://huggingface.co/PygmalionAI/pygmalion-6b/tree/dev) and the [WizardLM](https://huggingface.co/ehartford/WizardLM-7B-Uncensored) datasets. The purpose of this model is to enable complex Instruct prompting but with the RP capabilties of Pygmalion. ### Prompting format ``` instruction: output: ``` - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ### Uses The intended use-case is Role-Playing with Instruct prompts. Guiding the bot towards a certain conversation style should be easier this way. Subject to experimentation. ### Out-of-Scope Use - Assistant Bot [subject to providing incorrect instructions] - Complex multi-character chat ### Risks The model can generate potentially harmful or NSFW outputs. Please use with caution. ### Citation WizardLM: ``` @misc{xu2023wizardlm, title={WizardLM: Empowering Large Language Models to Follow Complex Instructions}, author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang}, year={2023}, eprint={2304.12244}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AlpinDale__pygmalion-instruct) | Metric |Value| |---------------------------------|----:| |Avg. |48.37| |AI2 Reasoning Challenge (25-Shot)|52.56| |HellaSwag (10-Shot) |77.65| |MMLU (5-Shot) |35.94| |TruthfulQA (0-shot) |42.13| |Winogrande (5-shot) |72.06| |GSM8k (5-shot) | 9.86|