Metin
/

LLaMA-3-8B-Instruct-TR-DPO

Text Generation

Eval Results (legacy)

text-generation-inference

Model card Files Files and versions

Metin commited on May 21, 2024

Commit

6f37cc0

·

verified ·

1 Parent(s): 9d8b0b2

Update README.md

Files changed (1) hide show

README.md +60 -8

README.md CHANGED Viewed

@@ -9,15 +9,65 @@ model-index:
 - name: LLaMA-3-8B-Instruct-TR-DPO
   results:
   - task:
-      type: multi-choice
     dataset:
-      type: multi-choice
       name: MMLU_TR_V0.2
     metrics:
     - name: 5-shot
       type: 5-shot
       value: 0.4983
       verified: false
 ---
 <img src="https://huggingface.co/Metin/LLaMA-3-8B-Instruct-TR-DPO/resolve/main/llama.png"
@@ -101,12 +151,14 @@ print(outputs[0]["generated_text"][len(prompt):])
 ## OpenLLMTurkishLeaderboard_v0.2 benchmark results
-MMLU_TR_V0.2: 49.83%
-Truthful_QA: In Progress
-ARC: In Progress
-HellaSwag: In Progress
-GSM8K: In Progress
-Winogrande: In Progress
 ## Output Example (DPO Model vs Base Model)

 - name: LLaMA-3-8B-Instruct-TR-DPO
   results:
   - task:
+      type: multiple-choice
     dataset:
+      type: multiple-choice
       name: MMLU_TR_V0.2
     metrics:
     - name: 5-shot
       type: 5-shot
       value: 0.4983
       verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: Truthful_QA_V0.2
+    metrics:
+    - name: 0-shot
+      type: 0-shot
+      value: 0.5232
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: ARC_TR_V0.2
+    metrics:
+    - name: 25-shot
+      type: 25-shot
+      value: 0.4437
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: HellaSwag_TR_V0.2
+    metrics:
+    - name: 10-shot
+      type: 10-shot
+      value: 0.4558
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: GSM8K_TR_V0.2
+    metrics:
+    - name: 5-shot
+      type: 5-shot
+      value: 0.5421
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: Winogrande_TR_V0.2
+    metrics:
+    - name: 5-shot
+      type: 5-shot
+      value: 0.5506
+      verified: false
 ---
 <img src="https://huggingface.co/Metin/LLaMA-3-8B-Instruct-TR-DPO/resolve/main/llama.png"
 ## OpenLLMTurkishLeaderboard_v0.2 benchmark results
+- **MMLU_TR_V0.2**: 49.83%
+- **Truthful_QA_TR_V0.2**: 52.32%
+- **ARC_TR_V0.2**: 44.37%
+- **HellaSwag_TR_V0.2**: 45.58%
+- **GSM8K_TR_V0.2**: 54.21%
+- **Winogrande_TR_V0.2**: 55.06%
+These scores may differ from what you will get when you run the same benchmarks, as I did not use any inference engine (vLLM, TensorRT-LLM, etc.)
 ## Output Example (DPO Model vs Base Model)