AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified

EXAONE 4

๊ฐœ์š”

EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์€ EXAONE 3.5 ๋ชจ๋ธ๊ตฐ์˜ ๋†’์€ ์‹ค์šฉ์„ฑ๊ณผ EXAONE Deep ๋ชจ๋ธ๊ตฐ์˜ ํ–ฅ์ƒ๋œ ์‚ฌ๊ณ  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ๊ฐ Non-reasoning mode์™€ Reasoning mode๋กœ ํ†ตํ•ฉํ•œ ์ž์—ฐ์–ด ๋ชจ๋ธ(language model)์ž…๋‹ˆ๋‹ค. ์—์ด์ „ํ‹ฑ(agentic) AI ์‹œ๋Œ€์— ๋ฐœ๋งž์ถฐ EXAONE 4.0์€ ์—์ด์ „ํ‹ฑ ๋„๊ตฌ ์‚ฌ์šฉ ๋Šฅ๋ ฅ๊ณผ ๊ฐ™์€ ํ•ต์‹ฌ ๊ธฐ๋Šฅ์„ ํ†ตํ•ฉํ–ˆ๊ณ , ๊ธฐ์กด์˜ ๋‹ค๊ตญ์–ด ๋Šฅ๋ ฅ์„ ์˜์–ด, ํ•œ๊ตญ์–ด์™€ ๋”๋ถˆ์–ด ์ŠคํŽ˜์ธ์–ด๊นŒ์ง€ ํ™•์žฅํ–ˆ์Šต๋‹ˆ๋‹ค.

EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์€ ๋‘ ๊ฐœ์˜ ๋ชจ๋ธ: ๋†’์€ ์„ฑ๋Šฅ์„ ์œ„ํ•ด ์ตœ์ ํ™”๋œ 32B ์ค‘ํ˜• ๋ชจ๋ธ, ๊ทธ๋ฆฌ๊ณ  ์˜จ-๋””๋ฐ”์ด์Šค ํ™œ์šฉ์„ ์œ„ํ•ด ๋””์ž์ธ๋œ 1.2B ์†Œํ˜• ๋ชจ๋ธ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

EXAONE 4.0์˜ ๋ชจ๋ธ ๊ตฌ์กฐ๋Š” ์ด์ „ EXAONE ๋ชจ๋ธ๋“ค๊ณผ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜ ๋””์ž์ธ์„ ์ฑ„ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.

  1. Hybrid Attention: 32B ๋ชจ๋ธ์€ *Local attention (sliding window attention)*๊ณผ *Global attention (full attention)*์„ 3:1 ๋น„์œจ๋กœ ์—ฐ๊ฒฐํ•œ hybrid attention ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ „์ฒด ๋ฌธ๋งฅ์„ ๋” ์ž˜ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก global attention์—์„œ RoPE๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  2. QK-Reorder-Norm: ๋” ๋‚˜์€ downstream tasks ์„ฑ๋Šฅ์„ ์œ„ํ•ด ์—ฐ์‚ฐ๋Ÿ‰์˜ ์ฆ๊ฐ€๋ฅผ ๊ฐ์ˆ˜ํ•˜๋ฉฐ ์ „ํ†ต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋˜ Pre-LN ๋ฐฉ์‹์„ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค. LayerNorm์˜ ์œ„์น˜๋ฅผ attention๊ณผ MLP์˜ ์ถœ๋ ฅ์— ์ ์šฉ๋˜๋„๋ก ์žฌ๋ฐฐ์น˜ํ–ˆ๊ณ , Q์™€ K projection ์งํ›„์—๋„ RMS normalization์„ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋” ์ž์„ธํ•œ ์ •๋ณด๋Š” ๊ธฐ์ˆ  ๋ณด๊ณ ์„œ, HuggingFace ๋…ผ๋ฌธ, ๋ธ”๋กœ๊ทธ, ๊ณต์‹ GitHub ํŽ˜์ด์ง€๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๊ณต๊ฐœ๋œ ๋ชจ๋“  ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๋Š” HuggingFace ์ฝœ๋ ‰์…˜์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์„ธ๋ถ€ ์ •๋ณด

Model Configuration 32B 1.2B
d_model 5,120 2,048
Number of layers 64 30
Normalization QK-Reorder-LN QK-Reorder-LN
Non-linearity SwiGLU SwiGLU
Feedforward dimension 27,392 4,096
Attention type Hybrid (3:1 Local-Global) Global
Head type GQA GQA
Number of heads 40 32
Number of KV heads 8 8
Head size 128 64
Max sequence length 131,072 65,536
RoPE theta 1,000,000 1,000,000
Tokenizer BBPE BBPE
Vocab size 102,400 102,400
Tied word embedding False True
Knowledge cut-off Nov. 2024 Nov. 2024

์‚ฌ์šฉ ํŒ

Non-reasoning mode

์ผ๋ฐ˜์ ์ธ ๋Œ€ํ™”์˜ ๊ฒฝ์šฐ ์•„๋ž˜ ์˜ˆ์ œ์™€ ๊ฐ™์ด EXAONE 4.0์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-4.0-32B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype="bfloat16",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ์›ํ•˜๋Š” ์ž…๋ ฅ์„ ์„ ํƒํ•˜์„ธ์š”
prompt = "Explain how wonderful you are"
prompt = "Explica lo increรญble que eres"
prompt = "๋„ˆ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋Œ€๋‹จํ•œ์ง€ ์„ค๋ช…ํ•ด ๋ด"

messages = [
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

Reasoning mode

The EXAONE 4.0 models have reasoning capabilities for handling complex problems. You can activate reasoning mode by using the enable_thinking=True argument with the tokenizer, which opens a reasoning block that starts with <think> tag without closing it.

EXAONE 4.0 ๋ชจ๋ธ๊ตฐ์€ ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ๊ณ  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ €์—์„œ enable_thinking=True ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•ด์„œ reasoning mode๋กœ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ <think> ํ† ํฐ์œผ๋กœ ์ถ”๋ก  ๋ธ”๋ก์„ ์—ฐ ๋’ค, ๋‹ซ์ง€ ์•Š๊ณ  ์ถ”๋ก ์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

messages = [
    {"role": "user", "content": "Which one is bigger, 3.12 vs 3.9?"}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=True,
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=128,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
print(tokenizer.decode(output[0]))

๋ชจ๋ธ์„ reasoning mode๋กœ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ, ์ƒ์„ฑ๋˜๋Š” ๋‹ต๋ณ€์ด sampling parameters์— ๊ต‰์žฅํžˆ ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋” ๋‚˜์€ ์ƒ์„ฑ ํ’ˆ์งˆ์„ ์œ„ํ•ด ๊ณต์‹ Usage Guideline๋ฅผ ์ฐธ์กฐํ•ด ์ฃผ์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Agentic tool use

EXAONE 4.0 ๋ชจ๋ธ์€ ๋„๊ตฌ ์‚ฌ์šฉ ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ˜ ๋•๋ถ„์— Agent๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” ์•„๋ž˜ ์˜ˆ์ œ์™€ ๊ฐ™์ด ๋„๊ตฌ ๋ช…์„ธ๋ฅผ ๋ชจ๋ธ์—๊ฒŒ ์ œ๊ณตํ•ด ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

import random

def roll_dice(max_num: int):
    return random.randint(1, max_num)

tools = [
    {
        "type": "function",
        "function": {
            "name": "roll_dice",
            "description": "Roll a dice with the number 1 to N. User can select the number N.",
            "parameters": {
                "type": "object",
                "required": ["max_num"],
                "properties": {
                    "max_num": {
                        "type": "int",
                        "description": "Max number of the dice"
                    }
                }
            }
        }
    }
]

messages = [
    {"role": "user", "content": "Roll D6 dice twice!"}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    tools=tools,
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)
print(tokenizer.decode(output[0]))

Exaone4Config

[[autodoc]] Exaone4Config

Exaone4Model

[[autodoc]] Exaone4Model - forward

Exaone4ForCausalLM

[[autodoc]] Exaone4ForCausalLM - forward

Exaone4ForSequenceClassification

[[autodoc]] Exaone4ForSequenceClassification - forward

Exaone4ForTokenClassification

[[autodoc]] Exaone4ForTokenClassification - forward

Exaone4ForQuestionAnswering

[[autodoc]] Exaone4ForQuestionAnswering - forward