Calvin806's picture
Update README.md
113f034 verified
---
license: other
license_name: exaone
license_link: https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B/blob/main/LICENSE
library_name: llama.cpp
tags:
- gguf
- exaone
- quantized
- llama-cpp
- korean
base_model: LGAI-EXAONE/EXAONE-Deep-2.4B-Instruct
model_type: exaone
---
# Calvin806/EXAONE-Deep-2.4B-GGUF
GGUF quantizations for **EXAONE-Deep-2.4B**.
## Contents
This folder typically contains:
- `EXAONE-Deep-2.4B.F16.gguf`
- `EXAONE-Deep-2.4B.Q4_K_M.gguf`
- `EXAONE-Deep-2.4B.Q5_K_M.gguf`
- `EXAONE-Deep-2.4B.Q8_0.gguf` (optional)
---
## ๐Ÿ”ง llama.cpp patch (EXAONE GGUF quantize compatibility)
EXAONE GGUF ๋ณ€ํ™˜/์–‘์žํ™” ๊ณผ์ •์—์„œ ์ผ๋ถ€ ๋ชจ๋ธ(์˜ˆ: **2.4B / 7.8B**) ๊ฐ„ **KV key ๋„ค์ด๋ฐ ๋ถˆ์ผ์น˜**๊ฐ€ ๋ฐœ๊ฒฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
- ์–ด๋–ค GGUF๋Š” `exaone.attention.layer_norm_epsilon`๋งŒ ์กด์žฌ
- ์–ด๋–ค GGUF๋Š” `exaone.attention.layer_norm_rms_epsilon`๋งŒ ์กด์žฌ
์ด ์ƒํƒœ์—์„œ vanilla llama.cpp์˜ `llama-quantize`๊ฐ€ ํŠน์ • ํ‚ค๋ฅผ ์ฐพ์ง€ ๋ชปํ•ด ์‹คํŒจํ•  ์ˆ˜ ์žˆ์–ด,
**llama.cpp์˜ model loader์—์„œ gguf key lookup์— fallback์„ ์ถ”๊ฐ€ํ•˜๋Š” ํŒจ์น˜**๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
### What was patched
`src/llama-model-loader.cpp`์—์„œ `gguf_find_key()` lookup์— ๋‹ค์Œ fallback์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ˆ˜์ •:
- key๊ฐ€ `exaone.attention.layer_norm_epsilon`์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ `exaone.attention.layer_norm_rms_epsilon`๋กœ ์žฌ์‹œ๋„
- key๊ฐ€ `exaone.attention.layer_norm_rms_epsilon`์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ `exaone.attention.layer_norm_epsilon`๋กœ ์žฌ์‹œ๋„
์ด ํŒจ์น˜๋ฅผ ํ†ตํ•ด **EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B** ๊ณ„์—ด์„ ๋™์ผ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ GGUF+quantizeํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### Patch note (minimal diff summary)
- Added a fallback wrapper/hook for `gguf_find_key()` inside `llama-model-loader.cpp`
- Ensured all lookups in that translation unit route through the fallback
This repo includes:
- `exaone-gguf-fallback.patch`
### Tested llama.cpp commit
- `021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2`
---
## Build (CUDA)
```bash
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git apply ../exaone-gguf-fallback.patch
rm -rf build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build -j
```
## Convert / Quantize
```bash
# Convert HF snapshot -> GGUF(F16)
python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf
# Quantize (example: Q4_K_M)
llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M
```