Adding Evaluation Results

952e1e7 verified about 1 year ago

3.78 kB

license: mit
library_name: transformers
tags:
  - mergekit
  - merge
base_model:
  - AIDC-AI/Marco-o1
  - happzy2633/qwen2.5-7b-ins-v3
model-index:
  - name: intelligence-cod-rag-7b-v3
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 68.98
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ClaudioItaly/intelligence-cod-rag-7b-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 34.78
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ClaudioItaly/intelligence-cod-rag-7b-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 9.82
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ClaudioItaly/intelligence-cod-rag-7b-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 3.02
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ClaudioItaly/intelligence-cod-rag-7b-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 10.68
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ClaudioItaly/intelligence-cod-rag-7b-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 35.51
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ClaudioItaly/intelligence-cod-rag-7b-v3
          name: Open LLM Leaderboard

merge

my elaboration and fusion of the models has led to a surprising result that I want to share with you all. I recommend you try this merge of mine.

Demonstrates strong reasoning skills when asked questions or texts. It is useful for reasoning to formulate questions with this example "Question: How did the Moon arise in your opinion?

GGUF ClaudioItaly/intelligence-cod-rag-7b-v3-Q6_K-GGUF GUUF ClaudioItaly/intelligence-cod-rag-7b-v3-Q8_0-GGUF

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ClaudioItaly__intelligence-cod-rag-7b-v3)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |27.13|
|IFEval (0-Shot)    |68.98|
|BBH (3-Shot)       |34.78|
|MATH Lvl 5 (4-Shot)| 9.82|
|GPQA (0-shot)      | 3.02|
|MuSR (0-shot)      |10.68|
|MMLU-PRO (5-shot)  |35.51|