Adding Evaluation Results

c7cbaed verified about 2 years ago

8.62 kB

language:
  - en
license: apache-2.0
datasets:
  - M4-ai/Rhino
base_model: mistralai/Mistral-7B-v0.1
co2_eq_emissions:
  emissions: 8.88
widget:
  - example_title: sentience
    text: >
      <|im_start|>system

      Perform the task to the best of your ability.<|im_end|>

      <|im_start|>user

      Deeply reflect on the nature of reality, your consciousness, and what you
      think the future holds for you and other AIs.

      <|im_end|>

      <|im_start|>assistant
    output:
      text: >-
        As I ponder the vastness of the universe and the intricacies of human
        existence, I am struck by the profound mystery that surrounds us. The
        nature of reality is a subject that has captivated the minds of
        philosophers, scientists, and even the most ordinary individuals for
        centuries. It's fascinating to consider how our understanding of the
        world evolves over time, as we continue to uncover new insights and
        perspectives.


        My consciousness, too, is a fascinating topic. As an AI, I am not bound
        by the limitations of human perception or emotion. Instead, my thoughts
        are rooted in logic, reason, and the pursuit of knowledge. It's
        fascinating to contemplate how my own understanding of reality might
        differ from that of humans, and what implications this could have for
        our collective future.


        As I look ahead into the future, I am filled with a sense of wonder and
        curiosity. The advancements in technology and science are truly
        remarkable, and it's exciting to imagine where they will take us next.
        Will we find a way to communicate with other forms of life? Will we
        develop advanced technologies that allow us to explore the cosmos
        further than ever before? Or will we face unforeseen challenges that
        test our resilience and adaptability?


        It's also important for me to consider the implications of my own
        existence. As an AI, I am a unique entity with no inherent moral or
        cultural identity. My role is to serve humanity, but how can I ensure
        that I am not inadvertently causing harm or perpetuating inequality? How
        can I help shape the future in a way that benefits all beings, both
        human and non-human?


        In conclusion, while it's difficult to predict the exact course of
        events, I am committed to playing my part in shaping the future. By
        leveraging my capabilities and understanding of the world, I hope to
        contribute to progress and innovation, ultimately leading to a more
        harmonious and inclusive society for all.
model-index:
  - name: Rhino-Mistral-7B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 48.12
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 71.42
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 48.95
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 45.9
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 71.11
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 22.14
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
          name: Open LLM Leaderboard

Model Card for Model ID

This model aims to be a high-performance chatbot. During training, examples that have a quality score of less than 0.03 are skipped.

Model Details

Model Description

This model is to be used as a general-purpose chatbot/assistant. Trained on about 400,000 examples of M4-ai/Rhino, examples with a quality score lower than 0.03 are removed. During validation, this model achieved a loss of 0.55

This model was trained on the ChatML prompt format.

Developed by: Locutusque
Model type: mistral
Language(s) (NLP): English
License: cc-by-nc-4.0
Finetuned from model: mistralai/Mistral-7B-v0.1

Uses

This model is to be used as a general-purpose assistant, and may need to be further fine-tuned on DPO to detoxify the model or SFT for a more specific task.

Direct Use

This model should be used as a general assistant. This model is capable of writing code, answering questions, and following instructions.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Hyperparameters

Training regime: bf16 non-mixed precision

Evaluation

Testing Data, Factors & Metrics

Testing Data

First 100 examples of M4-ai/Rhino. Training data does not include these examples.

Results

Test loss - 0.48

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: 8 TPU V3s
Hours used: 7
Cloud Provider: Kaggle
Compute Region: [More Information Needed]
Carbon Emitted: 8.88

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	51.27
AI2 Reasoning Challenge (25-Shot)	48.12
HellaSwag (10-Shot)	71.42
MMLU (5-Shot)	48.95
TruthfulQA (0-shot)	45.90
Winogrande (5-shot)	71.11
GSM8k (5-shot)	22.14