EXAONE 4
๊ฐ์
EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์ EXAONE 3.5 ๋ชจ๋ธ๊ตฐ์ ๋์ ์ค์ฉ์ฑ๊ณผ EXAONE Deep ๋ชจ๋ธ๊ตฐ์ ํฅ์๋ ์ฌ๊ณ ์ถ๋ก ๋ฅ๋ ฅ์ ๊ฐ๊ฐ Non-reasoning mode์ Reasoning mode๋ก ํตํฉํ ์์ฐ์ด ๋ชจ๋ธ(language model)์ ๋๋ค. ์์ด์ ํฑ(agentic) AI ์๋์ ๋ฐ๋ง์ถฐ EXAONE 4.0์ ์์ด์ ํฑ ๋๊ตฌ ์ฌ์ฉ ๋ฅ๋ ฅ๊ณผ ๊ฐ์ ํต์ฌ ๊ธฐ๋ฅ์ ํตํฉํ๊ณ , ๊ธฐ์กด์ ๋ค๊ตญ์ด ๋ฅ๋ ฅ์ ์์ด, ํ๊ตญ์ด์ ๋๋ถ์ด ์คํ์ธ์ด๊น์ง ํ์ฅํ์ต๋๋ค.
EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์ ๋ ๊ฐ์ ๋ชจ๋ธ: ๋์ ์ฑ๋ฅ์ ์ํด ์ต์ ํ๋ 32B ์คํ ๋ชจ๋ธ, ๊ทธ๋ฆฌ๊ณ ์จ-๋๋ฐ์ด์ค ํ์ฉ์ ์ํด ๋์์ธ๋ 1.2B ์ํ ๋ชจ๋ธ์ผ๋ก ๊ตฌ์ฑ๋์ด ์์ต๋๋ค.
EXAONE 4.0์ ๋ชจ๋ธ ๊ตฌ์กฐ๋ ์ด์ EXAONE ๋ชจ๋ธ๋ค๊ณผ ๋ค๋ฅธ ์ํคํ ์ฒ ๋์์ธ์ ์ฑํํ์ต๋๋ค.
- Hybrid Attention: 32B ๋ชจ๋ธ์ *Local attention (sliding window attention)*๊ณผ *Global attention (full attention)*์ 3:1 ๋น์จ๋ก ์ฐ๊ฒฐํ hybrid attention ๊ตฌ์กฐ๋ฅผ ์ฑํํ์ต๋๋ค. ๋ํ ์ ์ฒด ๋ฌธ๋งฅ์ ๋ ์ ์ดํดํ ์ ์๋๋ก global attention์์ RoPE๋ฅผ ์ฌ์ฉํ์ง ์์์ต๋๋ค.
- QK-Reorder-Norm: ๋ ๋์ downstream tasks ์ฑ๋ฅ์ ์ํด ์ฐ์ฐ๋์ ์ฆ๊ฐ๋ฅผ ๊ฐ์ํ๋ฉฐ ์ ํต์ ์ผ๋ก ์ฌ์ฉ๋๊ณ ์๋ Pre-LN ๋ฐฉ์์ ๋ณ๊ฒฝํ์ต๋๋ค. LayerNorm์ ์์น๋ฅผ attention๊ณผ MLP์ ์ถ๋ ฅ์ ์ ์ฉ๋๋๋ก ์ฌ๋ฐฐ์นํ๊ณ , Q์ K projection ์งํ์๋ RMS normalization์ ์ถ๊ฐํ์ต๋๋ค.
๋ ์์ธํ ์ ๋ณด๋ ๊ธฐ์ ๋ณด๊ณ ์, HuggingFace ๋ ผ๋ฌธ, ๋ธ๋ก๊ทธ, ๊ณต์ GitHub ํ์ด์ง๋ฅผ ์ฐธ๊ณ ํด์ฃผ์๊ธธ ๋ฐ๋๋๋ค.
๊ณต๊ฐ๋ ๋ชจ๋ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํธ๋ HuggingFace ์ฝ๋ ์ ์์ ํ์ธํ ์ ์์ต๋๋ค.
๋ชจ๋ธ ์ธ๋ถ ์ ๋ณด
| Model Configuration | 32B | 1.2B |
|---|---|---|
| d_model | 5,120 | 2,048 |
| Number of layers | 64 | 30 |
| Normalization | QK-Reorder-LN | QK-Reorder-LN |
| Non-linearity | SwiGLU | SwiGLU |
| Feedforward dimension | 27,392 | 4,096 |
| Attention type | Hybrid (3:1 Local-Global) | Global |
| Head type | GQA | GQA |
| Number of heads | 40 | 32 |
| Number of KV heads | 8 | 8 |
| Head size | 128 | 64 |
| Max sequence length | 131,072 | 65,536 |
| RoPE theta | 1,000,000 | 1,000,000 |
| Tokenizer | BBPE | BBPE |
| Vocab size | 102,400 | 102,400 |
| Tied word embedding | False | True |
| Knowledge cut-off | Nov. 2024 | Nov. 2024 |
์ฌ์ฉ ํ
Non-reasoning mode
์ผ๋ฐ์ ์ธ ๋ํ์ ๊ฒฝ์ฐ ์๋ ์์ ์ ๊ฐ์ด EXAONE 4.0์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "LGAI-EXAONE/EXAONE-4.0-32B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype="bfloat16",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# ์ํ๋ ์
๋ ฅ์ ์ ํํ์ธ์
prompt = "Explain how wonderful you are"
prompt = "Explica lo increรญble que eres"
prompt = "๋๊ฐ ์ผ๋ง๋ ๋๋จํ์ง ์ค๋ช
ํด ๋ด"
messages = [
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to(model.device),
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))
Reasoning mode
The EXAONE 4.0 models have reasoning capabilities for handling complex problems. You can activate reasoning mode by using the enable_thinking=True argument with the tokenizer, which opens a reasoning block that starts with <think> tag without closing it.
EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์ ๋ณต์กํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํ ์ฌ๊ณ ์ถ๋ก ๋ฅ๋ ฅ์ ๊ฐ์ถ๊ณ ์์ต๋๋ค. ํ ํฌ๋์ด์ ์์ enable_thinking=True ์ธ์๋ฅผ ์ฌ์ฉํด์ reasoning mode๋ก ๋ชจ๋ธ์ ์ฌ์ฉํ ์ ์์ต๋๋ค. ์ด ๊ฒฝ์ฐ <think> ํ ํฐ์ผ๋ก ์ถ๋ก ๋ธ๋ก์ ์ฐ ๋ค, ๋ซ์ง ์๊ณ ์ถ๋ก ์ ์์ํฉ๋๋ค.
messages = [
{"role": "user", "content": "Which one is bigger, 3.12 vs 3.9?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
enable_thinking=True,
)
output = model.generate(
input_ids.to(model.device),
max_new_tokens=128,
do_sample=True,
temperature=0.6,
top_p=0.95
)
print(tokenizer.decode(output[0]))
๋ชจ๋ธ์ reasoning mode๋ก ์ฌ์ฉํ ๊ฒฝ์ฐ, ์์ฑ๋๋ ๋ต๋ณ์ด sampling parameters์ ๊ต์ฅํ ๋ฏผ๊ฐํฉ๋๋ค. ๋ฐ๋ผ์ ๋ ๋์ ์์ฑ ํ์ง์ ์ํด ๊ณต์ Usage Guideline๋ฅผ ์ฐธ์กฐํด ์ฃผ์๊ธธ ๋ฐ๋๋๋ค.
Agentic tool use
EXAONE 4.0 ๋ชจ๋ธ์ ๋๊ตฌ ์ฌ์ฉ ๋ฅ๋ ฅ์ ๊ฐ์ถ ๋๋ถ์ Agent๋ก ์ฌ์ฉํ ์ ์์ต๋๋ค. ์ด๋ฅผ ์ํด์๋ ์๋ ์์ ์ ๊ฐ์ด ๋๊ตฌ ๋ช ์ธ๋ฅผ ๋ชจ๋ธ์๊ฒ ์ ๊ณตํด ์ฃผ์ด์ผ ํฉ๋๋ค.
import random
def roll_dice(max_num: int):
return random.randint(1, max_num)
tools = [
{
"type": "function",
"function": {
"name": "roll_dice",
"description": "Roll a dice with the number 1 to N. User can select the number N.",
"parameters": {
"type": "object",
"required": ["max_num"],
"properties": {
"max_num": {
"type": "int",
"description": "Max number of the dice"
}
}
}
}
}
]
messages = [
{"role": "user", "content": "Roll D6 dice twice!"}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
tools=tools,
)
output = model.generate(
input_ids.to(model.device),
max_new_tokens=1024,
do_sample=True,
temperature=0.6,
top_p=0.95,
)
print(tokenizer.decode(output[0]))
Exaone4Config
[[autodoc]] Exaone4Config
Exaone4Model
[[autodoc]] Exaone4Model - forward
Exaone4ForCausalLM
[[autodoc]] Exaone4ForCausalLM - forward
Exaone4ForSequenceClassification
[[autodoc]] Exaone4ForSequenceClassification - forward
Exaone4ForTokenClassification
[[autodoc]] Exaone4ForTokenClassification - forward
Exaone4ForQuestionAnswering
[[autodoc]] Exaone4ForQuestionAnswering - forward