--- license: other license_name: exaone license_link: https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B/blob/main/LICENSE library_name: llama.cpp tags: - gguf - exaone - quantized - llama-cpp - korean base_model: LGAI-EXAONE/EXAONE-Deep-2.4B-Instruct model_type: exaone --- # Calvin806/EXAONE-Deep-2.4B-GGUF GGUF quantizations for **EXAONE-Deep-2.4B**. ## Contents This folder typically contains: - `EXAONE-Deep-2.4B.F16.gguf` - `EXAONE-Deep-2.4B.Q4_K_M.gguf` - `EXAONE-Deep-2.4B.Q5_K_M.gguf` - `EXAONE-Deep-2.4B.Q8_0.gguf` (optional) --- ## ๐Ÿ”ง llama.cpp patch (EXAONE GGUF quantize compatibility) EXAONE GGUF ๋ณ€ํ™˜/์–‘์žํ™” ๊ณผ์ •์—์„œ ์ผ๋ถ€ ๋ชจ๋ธ(์˜ˆ: **2.4B / 7.8B**) ๊ฐ„ **KV key ๋„ค์ด๋ฐ ๋ถˆ์ผ์น˜**๊ฐ€ ๋ฐœ๊ฒฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. - ์–ด๋–ค GGUF๋Š” `exaone.attention.layer_norm_epsilon`๋งŒ ์กด์žฌ - ์–ด๋–ค GGUF๋Š” `exaone.attention.layer_norm_rms_epsilon`๋งŒ ์กด์žฌ ์ด ์ƒํƒœ์—์„œ vanilla llama.cpp์˜ `llama-quantize`๊ฐ€ ํŠน์ • ํ‚ค๋ฅผ ์ฐพ์ง€ ๋ชปํ•ด ์‹คํŒจํ•  ์ˆ˜ ์žˆ์–ด, **llama.cpp์˜ model loader์—์„œ gguf key lookup์— fallback์„ ์ถ”๊ฐ€ํ•˜๋Š” ํŒจ์น˜**๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ### What was patched `src/llama-model-loader.cpp`์—์„œ `gguf_find_key()` lookup์— ๋‹ค์Œ fallback์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ˆ˜์ •: - key๊ฐ€ `exaone.attention.layer_norm_epsilon`์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ `exaone.attention.layer_norm_rms_epsilon`๋กœ ์žฌ์‹œ๋„ - key๊ฐ€ `exaone.attention.layer_norm_rms_epsilon`์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ `exaone.attention.layer_norm_epsilon`๋กœ ์žฌ์‹œ๋„ ์ด ํŒจ์น˜๋ฅผ ํ†ตํ•ด **EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B** ๊ณ„์—ด์„ ๋™์ผ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ GGUF+quantizeํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ### Patch note (minimal diff summary) - Added a fallback wrapper/hook for `gguf_find_key()` inside `llama-model-loader.cpp` - Ensured all lookups in that translation unit route through the fallback This repo includes: - `exaone-gguf-fallback.patch` ### Tested llama.cpp commit - `021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2` --- ## Build (CUDA) ```bash git clone https://github.com/ggml-org/llama.cpp cd llama.cpp git apply ../exaone-gguf-fallback.patch rm -rf build cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON cmake --build build -j ``` ## Convert / Quantize ```bash # Convert HF snapshot -> GGUF(F16) python3 llama.cpp/convert_hf_to_gguf.py --outtype f16 --outfile model.F16.gguf # Quantize (example: Q4_K_M) llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M ```