kollama2-7b / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
516f99e
|
raw
history blame
1.77 kB
metadata
license: apache-2.0
language:
  - en
  - ko
model-index:
  - name: ARC
    results:
      - task:
          type: natural-language-generation
        dataset:
          type: huggingface
          name: ARC-Challenge(25 shots)
        metrics:
          - type: acc_norm
            value: 53.24
  - name: HellaSwag
    results:
      - task:
          type: natural-language-generation
        dataset:
          type: huggingface
          name: HellaSwag(10 shots)
        metrics:
          - type: acc_norm
            value: 78.82
  - name: MMLU
    results:
      - task:
          type: natural-language-generation
        dataset:
          type: huggingface
          name: MMLU(5 shots)
        metrics:
          - type: acc
            value: 42.38
  - name: TruthfulQA
    results:
      - task:
          type: natural-language-generation
        dataset:
          type: huggingface
          name: TruthfulQA(0 shots)
        metrics:
          - type: MC2
            value: 44.65

Fine-tunned Llama-2 Model with Korean Corpus

This model is a further trained Llama-2 model on the Korean datasets.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 43.71
ARC (25-shot) 53.24
HellaSwag (10-shot) 78.78
MMLU (5-shot) 42.31
TruthfulQA (0-shot) 44.56
Winogrande (5-shot) 73.95
GSM8K (5-shot) 5.99
DROP (3-shot) 7.11