| <!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| โ ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # EXAONE 4 | |
| ## ๊ฐ์ | |
| **[EXAONE 4.0](https://github.com/LG-AI-EXAONE/EXAONE-4.0)** ๋ชจ๋ธ๊ตฐ์ [EXAONE 3.5](https://github.com/LG-AI-EXAONE/EXAONE-3.5) ๋ชจ๋ธ๊ตฐ์ ๋์ ์ค์ฉ์ฑ๊ณผ [EXAONE Deep](https://github.com/LG-AI-EXAONE/EXAONE-Deep) ๋ชจ๋ธ๊ตฐ์ ํฅ์๋ ์ฌ๊ณ ์ถ๋ก ๋ฅ๋ ฅ์ ๊ฐ๊ฐ Non-reasoning mode์ Reasoning mode๋ก ํตํฉํ ์์ฐ์ด ๋ชจ๋ธ(language model)์ ๋๋ค. ์์ด์ ํฑ(agentic) AI ์๋์ ๋ฐ๋ง์ถฐ EXAONE 4.0์ ์์ด์ ํฑ ๋๊ตฌ ์ฌ์ฉ ๋ฅ๋ ฅ๊ณผ ๊ฐ์ ํต์ฌ ๊ธฐ๋ฅ์ ํตํฉํ๊ณ , ๊ธฐ์กด์ ๋ค๊ตญ์ด ๋ฅ๋ ฅ์ ์์ด, ํ๊ตญ์ด์ ๋๋ถ์ด ์คํ์ธ์ด๊น์ง ํ์ฅํ์ต๋๋ค. | |
| EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์ ๋ ๊ฐ์ ๋ชจ๋ธ: ๋์ ์ฑ๋ฅ์ ์ํด ์ต์ ํ๋ 32B ์คํ ๋ชจ๋ธ, ๊ทธ๋ฆฌ๊ณ ์จ-๋๋ฐ์ด์ค ํ์ฉ์ ์ํด ๋์์ธ๋ 1.2B ์ํ ๋ชจ๋ธ์ผ๋ก ๊ตฌ์ฑ๋์ด ์์ต๋๋ค. | |
| EXAONE 4.0์ ๋ชจ๋ธ ๊ตฌ์กฐ๋ ์ด์ EXAONE ๋ชจ๋ธ๋ค๊ณผ ๋ค๋ฅธ ์ํคํ ์ฒ ๋์์ธ์ ์ฑํํ์ต๋๋ค. | |
| 1. **Hybrid Attention**: 32B ๋ชจ๋ธ์ *Local attention (sliding window attention)*๊ณผ *Global attention (full attention)*์ 3:1 ๋น์จ๋ก ์ฐ๊ฒฐํ hybrid attention ๊ตฌ์กฐ๋ฅผ ์ฑํํ์ต๋๋ค. ๋ํ ์ ์ฒด ๋ฌธ๋งฅ์ ๋ ์ ์ดํดํ ์ ์๋๋ก global attention์์ RoPE๋ฅผ ์ฌ์ฉํ์ง ์์์ต๋๋ค. | |
| 2. **QK-Reorder-Norm**: ๋ ๋์ downstream tasks ์ฑ๋ฅ์ ์ํด ์ฐ์ฐ๋์ ์ฆ๊ฐ๋ฅผ ๊ฐ์ํ๋ฉฐ ์ ํต์ ์ผ๋ก ์ฌ์ฉ๋๊ณ ์๋ Pre-LN ๋ฐฉ์์ ๋ณ๊ฒฝํ์ต๋๋ค. LayerNorm์ ์์น๋ฅผ attention๊ณผ MLP์ ์ถ๋ ฅ์ ์ ์ฉ๋๋๋ก ์ฌ๋ฐฐ์นํ๊ณ , Q์ K projection ์งํ์๋ RMS normalization์ ์ถ๊ฐํ์ต๋๋ค. | |
| ๋ ์์ธํ ์ ๋ณด๋ [๊ธฐ์ ๋ณด๊ณ ์](https://huggingface.co/papers/2507.11407), [HuggingFace ๋ ผ๋ฌธ](https://huggingface.co/papers/2507.11407), [๋ธ๋ก๊ทธ](https://www.lgresearch.ai/blog/view?seq=576), [๊ณต์ GitHub](https://github.com/LG-AI-EXAONE/EXAONE-4.0) ํ์ด์ง๋ฅผ ์ฐธ๊ณ ํด์ฃผ์๊ธธ ๋ฐ๋๋๋ค. | |
| ๊ณต๊ฐ๋ ๋ชจ๋ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํธ๋ [HuggingFace ์ฝ๋ ์ ](https://huggingface.co/collections/LGAI-EXAONE/exaone-40-686b2e0069800c835ed48375)์์ ํ์ธํ ์ ์์ต๋๋ค. | |
| ## ๋ชจ๋ธ ์ธ๋ถ ์ ๋ณด | |
| | Model Configuration | 32B | 1.2B | | |
| |:-------------------|:-----:|:------:| | |
| | d_model | 5,120 | 2,048 | | |
| | Number of layers | 64 | 30 | | |
| | Normalization | QK-Reorder-LN | QK-Reorder-LN | | |
| | Non-linearity | SwiGLU | SwiGLU | | |
| | Feedforward dimension | 27,392 | 4,096 | | |
| | Attention type | Hybrid (3:1 Local-Global) | Global | | |
| | Head type | GQA | GQA | | |
| | Number of heads | 40 | 32 | | |
| | Number of KV heads | 8 | 8 | | |
| | Head size | 128 | 64 | | |
| | Max sequence length | 131,072 | 65,536 | | |
| | RoPE theta | 1,000,000 | 1,000,000 | | |
| | Tokenizer | BBPE | BBPE | | |
| | Vocab size | 102,400 | 102,400 | | |
| | Tied word embedding | False | True | | |
| | Knowledge cut-off | Nov. 2024 | Nov. 2024 | | |
| ## ์ฌ์ฉ ํ | |
| ### Non-reasoning mode | |
| ์ผ๋ฐ์ ์ธ ๋ํ์ ๊ฒฝ์ฐ ์๋ ์์ ์ ๊ฐ์ด EXAONE 4.0์ ์ฌ์ฉํ ์ ์์ต๋๋ค. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "LGAI-EXAONE/EXAONE-4.0-32B" | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| dtype="bfloat16", | |
| device_map="auto" | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| # ์ํ๋ ์ ๋ ฅ์ ์ ํํ์ธ์ | |
| prompt = "Explain how wonderful you are" | |
| prompt = "Explica lo increรญble que eres" | |
| prompt = "๋๊ฐ ์ผ๋ง๋ ๋๋จํ์ง ์ค๋ช ํด ๋ด" | |
| messages = [ | |
| {"role": "user", "content": prompt} | |
| ] | |
| input_ids = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_tensors="pt" | |
| ) | |
| output = model.generate( | |
| input_ids.to(model.device), | |
| max_new_tokens=128, | |
| do_sample=False, | |
| ) | |
| print(tokenizer.decode(output[0])) | |
| ``` | |
| ### Reasoning mode | |
| The EXAONE 4.0 models have reasoning capabilities for handling complex problems. You can activate reasoning mode by using the `enable_thinking=True` argument with the tokenizer, which opens a reasoning block that starts with `<think>` tag without closing it. | |
| EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์ ๋ณต์กํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํ ์ฌ๊ณ ์ถ๋ก ๋ฅ๋ ฅ์ ๊ฐ์ถ๊ณ ์์ต๋๋ค. ํ ํฌ๋์ด์ ์์ `enable_thinking=True` ์ธ์๋ฅผ ์ฌ์ฉํด์ reasoning mode๋ก ๋ชจ๋ธ์ ์ฌ์ฉํ ์ ์์ต๋๋ค. ์ด ๊ฒฝ์ฐ `<think>` ํ ํฐ์ผ๋ก ์ถ๋ก ๋ธ๋ก์ ์ฐ ๋ค, ๋ซ์ง ์๊ณ ์ถ๋ก ์ ์์ํฉ๋๋ค. | |
| ```python | |
| messages = [ | |
| {"role": "user", "content": "Which one is bigger, 3.12 vs 3.9?"} | |
| ] | |
| input_ids = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_tensors="pt", | |
| enable_thinking=True, | |
| ) | |
| output = model.generate( | |
| input_ids.to(model.device), | |
| max_new_tokens=128, | |
| do_sample=True, | |
| temperature=0.6, | |
| top_p=0.95 | |
| ) | |
| print(tokenizer.decode(output[0])) | |
| ``` | |
| > [!IMPORTANT] | |
| > ๋ชจ๋ธ์ reasoning mode๋ก ์ฌ์ฉํ ๊ฒฝ์ฐ, ์์ฑ๋๋ ๋ต๋ณ์ด sampling parameters์ ๊ต์ฅํ ๋ฏผ๊ฐํฉ๋๋ค. ๋ฐ๋ผ์ ๋ ๋์ ์์ฑ ํ์ง์ ์ํด ๊ณต์ [Usage Guideline](https://github.com/LG-AI-EXAONE/EXAONE-4.0#usage-guideline)๋ฅผ ์ฐธ์กฐํด ์ฃผ์๊ธธ ๋ฐ๋๋๋ค. | |
| ### Agentic tool use | |
| EXAONE 4.0 ๋ชจ๋ธ์ ๋๊ตฌ ์ฌ์ฉ ๋ฅ๋ ฅ์ ๊ฐ์ถ ๋๋ถ์ Agent๋ก ์ฌ์ฉํ ์ ์์ต๋๋ค. ์ด๋ฅผ ์ํด์๋ ์๋ ์์ ์ ๊ฐ์ด ๋๊ตฌ ๋ช ์ธ๋ฅผ ๋ชจ๋ธ์๊ฒ ์ ๊ณตํด ์ฃผ์ด์ผ ํฉ๋๋ค. | |
| ```python | |
| import random | |
| def roll_dice(max_num: int): | |
| return random.randint(1, max_num) | |
| tools = [ | |
| { | |
| "type": "function", | |
| "function": { | |
| "name": "roll_dice", | |
| "description": "Roll a dice with the number 1 to N. User can select the number N.", | |
| "parameters": { | |
| "type": "object", | |
| "required": ["max_num"], | |
| "properties": { | |
| "max_num": { | |
| "type": "int", | |
| "description": "Max number of the dice" | |
| } | |
| } | |
| } | |
| } | |
| } | |
| ] | |
| messages = [ | |
| {"role": "user", "content": "Roll D6 dice twice!"} | |
| ] | |
| input_ids = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_tensors="pt", | |
| tools=tools, | |
| ) | |
| output = model.generate( | |
| input_ids.to(model.device), | |
| max_new_tokens=1024, | |
| do_sample=True, | |
| temperature=0.6, | |
| top_p=0.95, | |
| ) | |
| print(tokenizer.decode(output[0])) | |
| ``` | |
| ## Exaone4Config | |
| [[autodoc]] Exaone4Config | |
| ## Exaone4Model | |
| [[autodoc]] Exaone4Model | |
| - forward | |
| ## Exaone4ForCausalLM | |
| [[autodoc]] Exaone4ForCausalLM | |
| - forward | |
| ## Exaone4ForSequenceClassification | |
| [[autodoc]] Exaone4ForSequenceClassification | |
| - forward | |
| ## Exaone4ForTokenClassification | |
| [[autodoc]] Exaone4ForTokenClassification | |
| - forward | |
| ## Exaone4ForQuestionAnswering | |
| [[autodoc]] Exaone4ForQuestionAnswering | |
| - forward |