| --- |
| license: other |
| license_name: exaone |
| license_link: https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B/blob/main/LICENSE |
| library_name: llama.cpp |
| tags: |
| - gguf |
| - exaone |
| - quantized |
| - llama-cpp |
| - korean |
| base_model: LGAI-EXAONE/EXAONE-Deep-7.8B-Instruct |
| model_type: exaone |
| --- |
| # Calvin806/EXAONE-Deep-7.8B-GGUF |
|
|
| GGUF quantizations for **EXAONE-Deep-7.8B**. |
|
|
| ## Contents |
| This folder typically contains: |
| - `EXAONE-Deep-7.8B.F16.gguf` |
| - `EXAONE-Deep-7.8B.Q4_K_M.gguf` |
| - `EXAONE-Deep-7.8B.Q5_K_M.gguf` |
| - `EXAONE-Deep-7.8B.Q8_0.gguf` (optional) |
|
|
| --- |
|
|
| ## ๐ง llama.cpp patch (EXAONE GGUF quantize compatibility) |
|
|
| EXAONE GGUF ๋ณํ/์์ํ ๊ณผ์ ์์ ์ผ๋ถ ๋ชจ๋ธ(์: **2.4B / 7.8B**) ๊ฐ **KV key ๋ค์ด๋ฐ ๋ถ์ผ์น**๊ฐ ๋ฐ๊ฒฌ๋์์ต๋๋ค. |
|
|
| - ์ด๋ค GGUF๋ `exaone.attention.layer_norm_epsilon`๋ง ์กด์ฌ |
| - ์ด๋ค GGUF๋ `exaone.attention.layer_norm_rms_epsilon`๋ง ์กด์ฌ |
|
|
| ์ด ์ํ์์ vanilla llama.cpp์ `llama-quantize`๊ฐ ํน์ ํค๋ฅผ ์ฐพ์ง ๋ชปํด ์คํจํ ์ ์์ด, |
| **llama.cpp์ model loader์์ gguf key lookup์ fallback์ ์ถ๊ฐํ๋ ํจ์น**๋ฅผ ์ ์ฉํ์ต๋๋ค. |
|
|
| ### What was patched |
|
|
| `src/llama-model-loader.cpp`์์ `gguf_find_key()` lookup์ ๋ค์ fallback์ ์ํํ๋๋ก ์์ : |
|
|
| - key๊ฐ `exaone.attention.layer_norm_epsilon`์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โ `exaone.attention.layer_norm_rms_epsilon`๋ก ์ฌ์๋ |
| - key๊ฐ `exaone.attention.layer_norm_rms_epsilon`์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โ `exaone.attention.layer_norm_epsilon`๋ก ์ฌ์๋ |
|
|
| ์ด ํจ์น๋ฅผ ํตํด **EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B** ๊ณ์ด์ ๋์ผ ํ์ดํ๋ผ์ธ์ผ๋ก GGUF+quantizeํ ์ ์์ต๋๋ค. |
|
|
| ### Patch note (minimal diff summary) |
| - Added a fallback wrapper/hook for `gguf_find_key()` inside `llama-model-loader.cpp` |
| - Ensured all lookups in that translation unit route through the fallback |
|
|
| This repo includes: |
| - `exaone-gguf-fallback.patch` |
|
|
| ### Tested llama.cpp commit |
| - `021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2` |
|
|
| --- |
|
|
| ## Build (CUDA) |
| ```bash |
| git clone https://github.com/ggml-org/llama.cpp |
| cd llama.cpp |
| git apply ../exaone-gguf-fallback.patch |
| |
| rm -rf build |
| cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON |
| cmake --build build -j |
| ``` |
|
|
| ## Convert / Quantize |
| ```bash |
| # Convert HF snapshot -> GGUF(F16) |
| python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf |
| |
| # Quantize (example: Q4_K_M) |
| llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M |
| ``` |
|
|