amphora commited on
Commit
357d83b
·
verified ·
1 Parent(s): 1b2b3f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -15
README.md CHANGED
@@ -10,23 +10,18 @@ pinned: false
10
  # Welcome to HAERAE
11
 
12
 
13
- We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.
14
 
15
- ## High-Quality Korean Corpora
16
- - [Korean WebText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-WEBTEXT) : A collection of 2B tokens of Korean text collected from the web.
17
- - [Korean SyntheticText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-SyntheticText-1.5B) : A collection of 1.5B tokens of Korean text synthetically generated.
18
 
 
 
19
 
20
- ## Evaluation Benchmarks
21
- - **HAE_RAE_BENCH Series**:
22
- - [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.
23
- - [HAE_RAE_BENCH_1.1](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.1): An ongoing project to refine the HAE_RAE_BENCH 1.0, enhancing its depth and coverage.
24
 
25
- - **KMMLU**:
26
- - [KMMLU](https://huggingface.co/datasets/HAERAE-HUB/KMMLU): A Korean reimplementation of MMLU, focusing on comprehensive language understanding across a wide range of subjects. See [paper](https://arxiv.org/abs/2402.11548) for further information.
27
- - [KMMLU-HARD](https://huggingface.co/datasets/HAERAE-HUB/KMMLU-HARD): A subset of KMMLU, with CoT samples.
28
-
29
- ## Bias and Fairness
30
- - [QARV](https://huggingface.co/datasets/HAERAE-HUB/QARV-preview) : An ongoing project aiming to benchmark regional bias in Large Language Models (LLMs).
31
 
32
- If you have any inquiries or are interested in joining our team, please contact me at `spthsrbwls123@yonsei.ac.kr`.
 
 
 
10
  # Welcome to HAERAE
11
 
12
 
13
+ We are a non-profit research lab focused on understanding and building better Korean language models. See below for an overview of our projects.
14
 
15
+ **Evaluation**
16
+ We have built _the_ most-widely used korean benchmarks including HAE-RAE Bench (cultural knowledge, [dataset](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0), [paper](https://arxiv.org/abs/2309.02706)),
17
+ KMMLU (general knowledge, [dataset](https://huggingface.co/datasets/HAERAE-HUB/KMMLU), [paper](https://arxiv.org/abs/2402.11548)), HRM8K (math, [dataset](https://huggingface.co/datasets/HAERAE-HUB/HRM8K), [paper](https://www.arxiv.org/abs/2501.02448)), and KMMLU-Redux/Pro (general knowledge, [dataset](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro), [paper](https://arxiv.org/abs/2507.08924)).
18
 
19
+ **Reasoning Language Models**
20
+ With cooperation with [KISTI-KONI](https://huggingface.co/KISTI-KONI) we released the [KO-REAson](https://huggingface.co/KOREAson) series, <10B reasoning language models trained for Korean.
21
 
 
 
 
 
22
 
23
+ # News
 
 
 
 
 
24
 
25
+ 2025.08.31: We release six [KO-REAson-0831 models](https://huggingface.co/collections/KoReason/koreason-0831-68b1363e1b3726b041a0a638) 🔥🔥🔥
26
+ 2025.07.11: We've collaborated with LG AI Research to build [KMMLU-PRO](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro) and major update to our KMMLU franchise.
27
+ 2025.01.05: We are releasing the first public korean math (📐e = ∑∞ⁿ⁼⁰ ¹ₙ🤓) benchmark [HRM8K](https://huggingface.co/datasets/HAERAE-HUB/HRM8K)