File size: 2,457 Bytes
75d5b9d 8c37b57 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | ---
license: other
license_name: exaone
license_link: https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B/blob/main/LICENSE
library_name: llama.cpp
tags:
- gguf
- exaone
- quantized
- llama-cpp
- korean
base_model: LGAI-EXAONE/EXAONE-Deep-32B-Instruct
model_type: exaone
---
# Calvin806/EXAONE-Deep-32B-GGUF
GGUF quantizations for **EXAONE-Deep-32B**.
## Contents
This folder typically contains:
- `EXAONE-Deep-32B.F16.gguf`
- `EXAONE-Deep-32B.Q4_K_M.gguf`
- `EXAONE-Deep-32B.Q5_K_M.gguf`
- `EXAONE-Deep-32B.Q8_0.gguf` (optional)
---
## ๐ง llama.cpp patch (EXAONE GGUF quantize compatibility)
EXAONE GGUF ๋ณํ/์์ํ ๊ณผ์ ์์ ์ผ๋ถ ๋ชจ๋ธ(์: **2.4B / 7.8B**) ๊ฐ **KV key ๋ค์ด๋ฐ ๋ถ์ผ์น**๊ฐ ๋ฐ๊ฒฌ๋์์ต๋๋ค.
- ์ด๋ค GGUF๋ `exaone.attention.layer_norm_epsilon`๋ง ์กด์ฌ
- ์ด๋ค GGUF๋ `exaone.attention.layer_norm_rms_epsilon`๋ง ์กด์ฌ
์ด ์ํ์์ vanilla llama.cpp์ `llama-quantize`๊ฐ ํน์ ํค๋ฅผ ์ฐพ์ง ๋ชปํด ์คํจํ ์ ์์ด,
**llama.cpp์ model loader์์ gguf key lookup์ fallback์ ์ถ๊ฐํ๋ ํจ์น**๋ฅผ ์ ์ฉํ์ต๋๋ค.
### What was patched
`src/llama-model-loader.cpp`์์ `gguf_find_key()` lookup์ ๋ค์ fallback์ ์ํํ๋๋ก ์์ :
- key๊ฐ `exaone.attention.layer_norm_epsilon`์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โ `exaone.attention.layer_norm_rms_epsilon`๋ก ์ฌ์๋
- key๊ฐ `exaone.attention.layer_norm_rms_epsilon`์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โ `exaone.attention.layer_norm_epsilon`๋ก ์ฌ์๋
์ด ํจ์น๋ฅผ ํตํด **EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B** ๊ณ์ด์ ๋์ผ ํ์ดํ๋ผ์ธ์ผ๋ก GGUF+quantizeํ ์ ์์ต๋๋ค.
### Patch note (minimal diff summary)
- Added a fallback wrapper/hook for `gguf_find_key()` inside `llama-model-loader.cpp`
- Ensured all lookups in that translation unit route through the fallback
This repo includes:
- `exaone-gguf-fallback.patch`
### Tested llama.cpp commit
- `021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2`
---
## Build (CUDA)
```bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git apply ../exaone-gguf-fallback.patch
rm -rf build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build -j
```
## Convert / Quantize
```bash
# Convert HF snapshot -> GGUF(F16)
python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf
# Quantize (example: Q4_K_M)
llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M
```
|