| | --- |
| | license: other |
| | license_name: exaone |
| | license_link: https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B/blob/main/LICENSE |
| | library_name: llama.cpp |
| | tags: |
| | - gguf |
| | - exaone |
| | - quantized |
| | - llama-cpp |
| | - korean |
| | base_model: LGAI-EXAONE/EXAONE-Deep-2.4B-Instruct |
| | model_type: exaone |
| | --- |
| | # Calvin806/EXAONE-Deep-2.4B-GGUF |
| |
|
| | GGUF quantizations for **EXAONE-Deep-2.4B**. |
| |
|
| | ## Contents |
| | This folder typically contains: |
| | - `EXAONE-Deep-2.4B.F16.gguf` |
| | - `EXAONE-Deep-2.4B.Q4_K_M.gguf` |
| | - `EXAONE-Deep-2.4B.Q5_K_M.gguf` |
| | - `EXAONE-Deep-2.4B.Q8_0.gguf` (optional) |
| |
|
| | --- |
| |
|
| | ## ๐ง llama.cpp patch (EXAONE GGUF quantize compatibility) |
| |
|
| | EXAONE GGUF ๋ณํ/์์ํ ๊ณผ์ ์์ ์ผ๋ถ ๋ชจ๋ธ(์: **2.4B / 7.8B**) ๊ฐ **KV key ๋ค์ด๋ฐ ๋ถ์ผ์น**๊ฐ ๋ฐ๊ฒฌ๋์์ต๋๋ค. |
| |
|
| | - ์ด๋ค GGUF๋ `exaone.attention.layer_norm_epsilon`๋ง ์กด์ฌ |
| | - ์ด๋ค GGUF๋ `exaone.attention.layer_norm_rms_epsilon`๋ง ์กด์ฌ |
| |
|
| | ์ด ์ํ์์ vanilla llama.cpp์ `llama-quantize`๊ฐ ํน์ ํค๋ฅผ ์ฐพ์ง ๋ชปํด ์คํจํ ์ ์์ด, |
| | **llama.cpp์ model loader์์ gguf key lookup์ fallback์ ์ถ๊ฐํ๋ ํจ์น**๋ฅผ ์ ์ฉํ์ต๋๋ค. |
| |
|
| | ### What was patched |
| |
|
| | `src/llama-model-loader.cpp`์์ `gguf_find_key()` lookup์ ๋ค์ fallback์ ์ํํ๋๋ก ์์ : |
| |
|
| | - key๊ฐ `exaone.attention.layer_norm_epsilon`์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โ `exaone.attention.layer_norm_rms_epsilon`๋ก ์ฌ์๋ |
| | - key๊ฐ `exaone.attention.layer_norm_rms_epsilon`์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โ `exaone.attention.layer_norm_epsilon`๋ก ์ฌ์๋ |
| |
|
| | ์ด ํจ์น๋ฅผ ํตํด **EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B** ๊ณ์ด์ ๋์ผ ํ์ดํ๋ผ์ธ์ผ๋ก GGUF+quantizeํ ์ ์์ต๋๋ค. |
| |
|
| | ### Patch note (minimal diff summary) |
| | - Added a fallback wrapper/hook for `gguf_find_key()` inside `llama-model-loader.cpp` |
| | - Ensured all lookups in that translation unit route through the fallback |
| |
|
| | This repo includes: |
| | - `exaone-gguf-fallback.patch` |
| |
|
| | ### Tested llama.cpp commit |
| | - `021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2` |
| |
|
| | --- |
| |
|
| | ## Build (CUDA) |
| | ```bash |
| | git clone https://github.com/ggml-org/llama.cpp |
| | cd llama.cpp |
| | git apply ../exaone-gguf-fallback.patch |
| | |
| | rm -rf build |
| | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON |
| | cmake --build build -j |
| | ``` |
| |
|
| | ## Convert / Quantize |
| | ```bash |
| | # Convert HF snapshot -> GGUF(F16) |
| | python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf |
| | |
| | # Quantize (example: Q4_K_M) |
| | llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M |
| | ``` |
| |
|