Update README.md
Browse filesUpdate benchmark information
README.md
CHANGED
|
@@ -81,42 +81,29 @@ The model was trained on approximately 100k high-quality Korean instruction exam
|
|
| 81 |
|
| 82 |
The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
|
| 83 |
|
| 84 |
-
| Benchmark
|
| 85 |
-
|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|
| 86 |
-
| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench)
|
| 87 |
-
| [ko-
|
| 88 |
-
| [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged)
|
| 89 |
-
| [ko-
|
| 90 |
-
| [
|
| 91 |
-
| [ko-
|
| 92 |
-
| [ko-
|
| 93 |
-
| [ko-
|
| 94 |
-
| [ko-ged-high](https://huggingface.co/datasets/davidkim205/ko-ged-high) | Korean high school GED multiple-choice question dataset | ged\:H |
|
| 95 |
-
| [ko-ged2-elementary](https://huggingface.co/datasets/davidkim205/ko-ged2-middle) | Korean elementary school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:E |
|
| 96 |
-
| [ko-ged2-middle](https://huggingface.co/datasets/davidkim205/ko-ged2-elementary) | Korean middle school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:M |
|
| 97 |
-
| [ko-ged2-high](https://huggingface.co/datasets/davidkim205/ko-ged2-high) | Korean high school GED multiple-choice dataset, updated for the 2025 GED Exam | ged2\:H |
|
| 98 |
-
| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa) | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning | gpqa |
|
| 99 |
-
| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500) | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation | math500 |
|
| 100 |
|
| 101 |
### Benchmark Results
|
| 102 |
|
| 103 |
|
| 104 |
-
| |
|
| 105 |
-
|---------|----------------------------------------:|-----------------------------
|
| 106 |
-
| Avg. | **8.
|
| 107 |
-
| bench | 8.26 |
|
| 108 |
-
|
|
| 109 |
-
| ged
|
| 110 |
-
|
|
| 111 |
-
|
|
| 112 |
-
|
|
| 113 |
-
|
|
| 114 |
-
|
|
| 115 |
-
| ged:H | **9.60** | 9.52 | 9.52 | 9.32 |
|
| 116 |
-
| ged2:E | 9.77 | 9.89 | **9.94** | 9.48 |
|
| 117 |
-
| ged2:M | **9.75** | 9.58 | 9.46 | 9.33 |
|
| 118 |
-
| ged2:H | **9.48** | 9.23 | 9.40 | 9.08 |
|
| 119 |
-
| gpqa | **4.55** | 3.69 | 3.38 | 3.54 |
|
| 120 |
-
| math500 | **8.56** | 8.38 | 6.26 | 5.00 |
|
| 121 |
-
|
| 122 |
|
|
|
|
| 81 |
|
| 82 |
The table below contains a description of the Korean LLM evaluation benchmark dataset used for the model evaluation. More information on the benchmarks is available at [Blog](https://davidkim205.github.io/).
|
| 83 |
|
| 84 |
+
| Benchmark | Description | Abbreviation |
|
| 85 |
+
|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
|
| 86 |
+
| [ko-bench](https://huggingface.co/datasets/davidkim205/ko-bench) | Korean-translated dataset of [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl) questions | bench |
|
| 87 |
+
| [ko-ged](https://huggingface.co/datasets/davidkim205/ko-ged) | Korean GED (elementary, middle, high school) open-ended question dataset<br/>Subjects: Korean, English, Mathematics, Science, Social Studies | ged |
|
| 88 |
+
| [ko-ged-mc-elementary](https://huggingface.co/datasets/davidkim205/ko-ged-mc-elementary) | Korean elementary school GED multiple-choice question dataset | ged\:E |
|
| 89 |
+
| [ko-ged-mc-middle](https://huggingface.co/datasets/davidkim205/ko-ged-mc-middle) | Korean middle school GED multiple-choice question dataset | ged\:M |
|
| 90 |
+
| [ko-ged-mc-high](https://huggingface.co/datasets/davidkim205/ko-ged-mc-high) | Korean high school GED multiple-choice question dataset | ged\:H |
|
| 91 |
+
| [ko-gpqa](https://huggingface.co/datasets/davidkim205/ko-gpqa) | Korean version of GPQA containing challenging physics questions designed to test deep understanding and logical reasoning | gpqa |
|
| 92 |
+
| [ko-math-500](https://huggingface.co/datasets/davidkim205/ko-math-500) | Korean-translated subset of 500 high school-level math problems from the MATH dataset, including detailed solutions with LaTeX notation | math500 |
|
| 93 |
+
| [ko-ifeval](https://huggingface.co/datasets/davidkim205/ko-ifeval) | Instruction-following evaluation dataset translated from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval), adapted for Korean language and culture | ifeval |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
### Benchmark Results
|
| 96 |
|
| 97 |
|
| 98 |
+
| | **davidkim205<br>Hunminai<br>-1.0-27b** | google<br>gemma-3<br>-27b-it | unsloth<br>gemma-3<br>-27b-it | google<br>gemma-2<br>-27b-it |
|
| 99 |
+
|---------|----------------------------------------:|-----------------------------:|------------------------------:|-----------------------------:|
|
| 100 |
+
| Avg. | **8.53** | 8.31 | 8.03 | 7.49 |
|
| 101 |
+
| bench | 8.26 | 8.06 | **8.27** | 7.59 |
|
| 102 |
+
| ged | **9.19** | 9.02 | 9.03 | 8.38 |
|
| 103 |
+
| ged:E | 9.86 | 9.86 | **9.93** | 9.51 |
|
| 104 |
+
| ged:M | 9.67 | 9.63 | **9.76** | 9.10 |
|
| 105 |
+
| ged:H | **9.60** | 9.52 | 9.52 | 9.32 |
|
| 106 |
+
| gpqa | **4.55** | 3.69 | 3.38 | 3.54 |
|
| 107 |
+
| math500 | **8.56** | 8.38 | 6.26 | 5.00 |
|
| 108 |
+
| ifeval | | | **8.10** | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|