mjkmain commited on
Commit
7cda0d0
·
verified ·
1 Parent(s): 5c21734

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -26,9 +26,9 @@ The model, training code, and training data are all **fully open**, allowing any
26
  - 🧪 **License**: Apache 2.0 (commercial use permitted)
27
 
28
  ```md
29
- KORMo: The First Fully Open-Source LLM from a Non-English Region
30
 
31
- KORMo was created with a public-interest mission to make world-class language models accessible to everyone.
32
  Our goal is to empower anyone to build and advance their own large language models at a global standard.
33
 
34
  Key Features:
@@ -36,7 +36,7 @@ Key Features:
36
  1. A 10B-parameter Korean–English reasoning model trained entirely from scratch.
37
  2. 100% open resources — including all training data, code, intermediate checkpoints, and tutorials — allowing anyone to reproduce and extend a near-SOTA model on their own.
38
  3. 3 trillion tokens of training data released publicly, featuring never-before-shared, high-quality full-cycle Korean datasets (for pretraining, post-training, general, reasoning, and reinforcement learning).
39
- 4. A collaborative effort by eight undergraduate and master’s students at the KAIST Graduate School of Culture Technology (MLP Lab), documented in a 45-page research paper.
40
 
41
  If you’ve ever used a Korean language model that performs well on benchmarks but feels strange in real use, or if fine-tuning only made it worse, you’re not alone.
42
 
@@ -109,6 +109,7 @@ By releasing every intermediate model and post-training dataset, we give users t
109
  git clone https://github.com/MLP-Lab/KORMo-tutorial.git
110
  cd KORMo-tutorial
111
  bash setup/create_uv_venv.sh
 
112
  ```
113
 
114
  ---
@@ -168,5 +169,3 @@ chat_prompt = tokenizer.apply_chat_template(
168
 
169
  ## Contact
170
  - KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr`
171
-
172
- ## Contributor
 
26
  - 🧪 **License**: Apache 2.0 (commercial use permitted)
27
 
28
  ```md
29
+ The First Fully Open-Source LLM from a Non-English Region
30
 
31
+ KORMo was created with a public-interest mission: to make world-class language models accessible to everyone.
32
  Our goal is to empower anyone to build and advance their own large language models at a global standard.
33
 
34
  Key Features:
 
36
  1. A 10B-parameter Korean–English reasoning model trained entirely from scratch.
37
  2. 100% open resources — including all training data, code, intermediate checkpoints, and tutorials — allowing anyone to reproduce and extend a near-SOTA model on their own.
38
  3. 3 trillion tokens of training data released publicly, featuring never-before-shared, high-quality full-cycle Korean datasets (for pretraining, post-training, general, reasoning, and reinforcement learning).
39
+ 4. A collaborative effort by eight master’s students at the KAIST Graduate School of Culture Technology (MLP Lab), documented in a 45-page research paper.
40
 
41
  If you’ve ever used a Korean language model that performs well on benchmarks but feels strange in real use, or if fine-tuning only made it worse, you’re not alone.
42
 
 
109
  git clone https://github.com/MLP-Lab/KORMo-tutorial.git
110
  cd KORMo-tutorial
111
  bash setup/create_uv_venv.sh
112
+ source .venv_kormo/bin/activate
113
  ```
114
 
115
  ---
 
169
 
170
  ## Contact
171
  - KyungTae Lim, Professor at KAIST. `ktlim@kaist.ac.kr`