Adding Evaluation Results

1fd3f9e verified about 2 years ago

5.41 kB

language:
  - en
license: apache-2.0
model-index:
  - name: flux-7b-v0.2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 66.55
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chanwit/flux-7b-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 86.12
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chanwit/flux-7b-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 65.38
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chanwit/flux-7b-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 51.8
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chanwit/flux-7b-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 79.32
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chanwit/flux-7b-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 72.63
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chanwit/flux-7b-v0.2
          name: Open LLM Leaderboard

Open Flux AI

Open Flux AI - Empowering developers with AI-driven Continuous Delivery solutions.

Welcome to Open Flux AI, a community initiative stemming from the Kube-7B project, dedicated to advancing AI expertise in Flux, Flagger, and Continuous Delivery technologies. Our mission is to use the power of AI to simplify and enhance the way developers interact with Flux and technologies around it.

Our first focus is on fine-tuning AI models to specialize in key areas such as Flux, Flagger, GitOps, and SOPS. By leveraging the raw data from Kube-7B and applying targeted Embedding techniques, we aim to create models that are highly proficient in these specific domains.

Our first major deliverable is flux-7b, a model based on Mistral 7B. flux-7b currently understands the basic knowledge of Flux, Flagger, GitOps, and SOPS. flux-7b has demonstrated to be better than ChatGPT in these contexts. See the screenshot.

Getting Started

To begin using flux-7b, follow this simple command:

ollama run chanwit/flux-7b

The GGUF files of this model can be obtained from HuggingFace.

We are planning to delivery our models in other formats like Llamafiles and Docker Containers. Please stay tuned.

Models

flux-7b: Our first model, built on Mistral 7B, is designed to provide assistance in Flux, Flagger, GitOps, and SOPS.

Datasets

At the beginning, the Open Flux AI project shares its foundational dataset with the Kube-7B project but refines it to focus on specific areas. We continuously work on expanding our dataset, especially in areas like Flux commands and Custom Resources, to further enhance the model's capabilities.

Contributions

We welcome and greatly appreciate contributions, especially in the form of question and answer pairs. We are seeking contributions for new datasets centered around knowledge of Flux commands and CR generations.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	70.30
AI2 Reasoning Challenge (25-Shot)	66.55
HellaSwag (10-Shot)	86.12
MMLU (5-Shot)	65.38
TruthfulQA (0-shot)	51.80
Winogrande (5-shot)	79.32
GSM8k (5-shot)	72.63