Calvin806/EXAONE-3.5-32B-Instruct-GGUF
GGUF quantizations for EXAONE-3.5-32B-Instruct.
Contents
This folder typically contains:
EXAONE-3.5-32B-Instruct.F16.ggufEXAONE-3.5-32B-Instruct.Q4_K_M.ggufEXAONE-3.5-32B-Instruct.Q5_K_M.ggufEXAONE-3.5-32B-Instruct.Q8_0.gguf(optional)
๐ง llama.cpp patch (EXAONE GGUF quantize compatibility)
EXAONE GGUF ๋ณํ/์์ํ ๊ณผ์ ์์ ์ผ๋ถ ๋ชจ๋ธ(์: 2.4B / 7.8B) ๊ฐ KV key ๋ค์ด๋ฐ ๋ถ์ผ์น๊ฐ ๋ฐ๊ฒฌ๋์์ต๋๋ค.
- ์ด๋ค GGUF๋
exaone.attention.layer_norm_epsilon๋ง ์กด์ฌ - ์ด๋ค GGUF๋
exaone.attention.layer_norm_rms_epsilon๋ง ์กด์ฌ
์ด ์ํ์์ vanilla llama.cpp์ llama-quantize๊ฐ ํน์ ํค๋ฅผ ์ฐพ์ง ๋ชปํด ์คํจํ ์ ์์ด,
llama.cpp์ model loader์์ gguf key lookup์ fallback์ ์ถ๊ฐํ๋ ํจ์น๋ฅผ ์ ์ฉํ์ต๋๋ค.
What was patched
src/llama-model-loader.cpp์์ gguf_find_key() lookup์ ๋ค์ fallback์ ์ํํ๋๋ก ์์ :
- key๊ฐ
exaone.attention.layer_norm_epsilon์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โexaone.attention.layer_norm_rms_epsilon๋ก ์ฌ์๋ - key๊ฐ
exaone.attention.layer_norm_rms_epsilon์ด๊ณ ์ฐพ์ง ๋ชปํ๋ฉด โexaone.attention.layer_norm_epsilon๋ก ์ฌ์๋
์ด ํจ์น๋ฅผ ํตํด EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B ๊ณ์ด์ ๋์ผ ํ์ดํ๋ผ์ธ์ผ๋ก GGUF+quantizeํ ์ ์์ต๋๋ค.
Patch note (minimal diff summary)
- Added a fallback wrapper/hook for
gguf_find_key()insidellama-model-loader.cpp - Ensured all lookups in that translation unit route through the fallback
This repo includes:
exaone-gguf-fallback.patch
Tested llama.cpp commit
021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2
Build (CUDA)
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git apply ../exaone-gguf-fallback.patch
rm -rf build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build -j
Convert / Quantize
# Convert HF snapshot -> GGUF(F16)
python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf
# Quantize (example: Q4_K_M)
llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
4-bit
5-bit
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for FloatDo/EXAONE-3.5-32B-Instruct-GGUF
Base model
LGAI-EXAONE/EXAONE-3.5-32B-Instruct