Calvin806/EXAONE-3.5-32B-Instruct-GGUF

GGUF quantizations for EXAONE-3.5-32B-Instruct.

Contents

This folder typically contains:

  • EXAONE-3.5-32B-Instruct.F16.gguf
  • EXAONE-3.5-32B-Instruct.Q4_K_M.gguf
  • EXAONE-3.5-32B-Instruct.Q5_K_M.gguf
  • EXAONE-3.5-32B-Instruct.Q8_0.gguf (optional)

๐Ÿ”ง llama.cpp patch (EXAONE GGUF quantize compatibility)

EXAONE GGUF ๋ณ€ํ™˜/์–‘์žํ™” ๊ณผ์ •์—์„œ ์ผ๋ถ€ ๋ชจ๋ธ(์˜ˆ: 2.4B / 7.8B) ๊ฐ„ KV key ๋„ค์ด๋ฐ ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ๊ฒฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ์–ด๋–ค GGUF๋Š” exaone.attention.layer_norm_epsilon๋งŒ ์กด์žฌ
  • ์–ด๋–ค GGUF๋Š” exaone.attention.layer_norm_rms_epsilon๋งŒ ์กด์žฌ

์ด ์ƒํƒœ์—์„œ vanilla llama.cpp์˜ llama-quantize๊ฐ€ ํŠน์ • ํ‚ค๋ฅผ ์ฐพ์ง€ ๋ชปํ•ด ์‹คํŒจํ•  ์ˆ˜ ์žˆ์–ด, llama.cpp์˜ model loader์—์„œ gguf key lookup์— fallback์„ ์ถ”๊ฐ€ํ•˜๋Š” ํŒจ์น˜๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

What was patched

src/llama-model-loader.cpp์—์„œ gguf_find_key() lookup์— ๋‹ค์Œ fallback์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ˆ˜์ •:

  • key๊ฐ€ exaone.attention.layer_norm_epsilon์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ exaone.attention.layer_norm_rms_epsilon๋กœ ์žฌ์‹œ๋„
  • key๊ฐ€ exaone.attention.layer_norm_rms_epsilon์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ exaone.attention.layer_norm_epsilon๋กœ ์žฌ์‹œ๋„

์ด ํŒจ์น˜๋ฅผ ํ†ตํ•ด EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B ๊ณ„์—ด์„ ๋™์ผ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ GGUF+quantizeํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Patch note (minimal diff summary)

  • Added a fallback wrapper/hook for gguf_find_key() inside llama-model-loader.cpp
  • Ensured all lookups in that translation unit route through the fallback

This repo includes:

  • exaone-gguf-fallback.patch

Tested llama.cpp commit

  • 021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2

Build (CUDA)

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git apply ../exaone-gguf-fallback.patch

rm -rf build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build -j

Convert / Quantize

# Convert HF snapshot -> GGUF(F16)
python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf

# Quantize (example: Q4_K_M)
llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M
Downloads last month
-
GGUF
Model size
32B params
Architecture
exaone
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for FloatDo/EXAONE-3.5-32B-Instruct-GGUF

Quantized
(14)
this model