Update README.md
Browse files
README.md
CHANGED
|
@@ -26,9 +26,9 @@ The model, training code, and training data are all **fully open**, allowing any
|
|
| 26 |
- 🧪 **License**: Apache 2.0 (commercial use permitted)
|
| 27 |
|
| 28 |
```md
|
| 29 |
-
|
| 30 |
|
| 31 |
-
KORMo was created with a public-interest mission
|
| 32 |
Our goal is to empower anyone to build and advance their own large language models at a global standard.
|
| 33 |
|
| 34 |
Key Features:
|
|
@@ -36,7 +36,7 @@ Key Features:
|
|
| 36 |
1. A 10B-parameter Korean–English reasoning model trained entirely from scratch.
|
| 37 |
2. 100% open resources — including all training data, code, intermediate checkpoints, and tutorials — allowing anyone to reproduce and extend a near-SOTA model on their own.
|
| 38 |
3. 3 trillion tokens of training data released publicly, featuring never-before-shared, high-quality full-cycle Korean datasets (for pretraining, post-training, general, reasoning, and reinforcement learning).
|
| 39 |
-
4. A collaborative effort by eight
|
| 40 |
|
| 41 |
If you’ve ever used a Korean language model that performs well on benchmarks but feels strange in real use, or if fine-tuning only made it worse, you’re not alone.
|
| 42 |
|
|
@@ -109,6 +109,7 @@ By releasing every intermediate model and post-training dataset, we give users t
|
|
| 109 |
git clone https://github.com/MLP-Lab/KORMo-tutorial.git
|
| 110 |
cd KORMo-tutorial
|
| 111 |
bash setup/create_uv_venv.sh
|
|
|
|
| 112 |
```
|
| 113 |
|
| 114 |
---
|
|
@@ -168,5 +169,3 @@ chat_prompt = tokenizer.apply_chat_template(
|
|
| 168 |
|
| 169 |
## Contact
|
| 170 |
- KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr`
|
| 171 |
-
|
| 172 |
-
## Contributor
|
|
|
|
| 26 |
- 🧪 **License**: Apache 2.0 (commercial use permitted)
|
| 27 |
|
| 28 |
```md
|
| 29 |
+
The First Fully Open-Source LLM from a Non-English Region
|
| 30 |
|
| 31 |
+
KORMo was created with a public-interest mission: to make world-class language models accessible to everyone.
|
| 32 |
Our goal is to empower anyone to build and advance their own large language models at a global standard.
|
| 33 |
|
| 34 |
Key Features:
|
|
|
|
| 36 |
1. A 10B-parameter Korean–English reasoning model trained entirely from scratch.
|
| 37 |
2. 100% open resources — including all training data, code, intermediate checkpoints, and tutorials — allowing anyone to reproduce and extend a near-SOTA model on their own.
|
| 38 |
3. 3 trillion tokens of training data released publicly, featuring never-before-shared, high-quality full-cycle Korean datasets (for pretraining, post-training, general, reasoning, and reinforcement learning).
|
| 39 |
+
4. A collaborative effort by eight master’s students at the KAIST Graduate School of Culture Technology (MLP Lab), documented in a 45-page research paper.
|
| 40 |
|
| 41 |
If you’ve ever used a Korean language model that performs well on benchmarks but feels strange in real use, or if fine-tuning only made it worse, you’re not alone.
|
| 42 |
|
|
|
|
| 109 |
git clone https://github.com/MLP-Lab/KORMo-tutorial.git
|
| 110 |
cd KORMo-tutorial
|
| 111 |
bash setup/create_uv_venv.sh
|
| 112 |
+
source .venv_kormo/bin/activate
|
| 113 |
```
|
| 114 |
|
| 115 |
---
|
|
|
|
| 169 |
|
| 170 |
## Contact
|
| 171 |
- KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr`
|
|
|
|
|
|