Add files using upload-large-folder tool

Browse files

Files changed (6) hide show

.gitattributes +3 -0
EXAONE-Deep-32B.Q4_K_M.gguf +3 -0
EXAONE-Deep-32B.Q5_K_M.gguf +3 -0
EXAONE-Deep-32B.Q8_0.gguf +3 -0
README.md +63 -5
exaone-gguf-fallback.patch +31 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+EXAONE-Deep-32B.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+EXAONE-Deep-32B.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+EXAONE-Deep-32B.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

EXAONE-Deep-32B.Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43182334408e49621f2ececf3a60795b9d598c18ac45c53222bccd7508cc938e
+size 19343748224

EXAONE-Deep-32B.Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9fdb6a931c39e6ac8ea458e13d30632231b86e5720f1c4d2c43728fa6c5f7be
+size 22696569984

EXAONE-Deep-32B.Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17142cb458297dfae87db197a13db51b22a8b7a4415ab82af68664ef9dacbd1a
+size 34009558144

README.md CHANGED Viewed

@@ -1,5 +1,63 @@
----
-license: other
-license_name: exaone
-license_link: https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B/blob/main/LICENSE
----

+# Calvin806/EXAONE-Deep-32B-GGUF
+GGUF quantizations for **EXAONE-Deep-32B**.
+## Contents
+This folder typically contains:
+- `EXAONE-Deep-32B.F16.gguf`
+- `EXAONE-Deep-32B.Q4_K_M.gguf`
+- `EXAONE-Deep-32B.Q5_K_M.gguf`
+- `EXAONE-Deep-32B.Q8_0.gguf` (optional)
+---
+## 🔧 llama.cpp patch (EXAONE GGUF quantize compatibility)
+EXAONE GGUF 변환/양자화 과정에서 일부 모델(예: **2.4B / 7.8B**) 간 **KV key 네이밍 불일치**가 발견되었습니다.
+- 어떤 GGUF는 `exaone.attention.layer_norm_epsilon`만 존재
+- 어떤 GGUF는 `exaone.attention.layer_norm_rms_epsilon`만 존재
+이 상태에서 vanilla llama.cpp의 `llama-quantize`가 특정 키를 찾지 못해 실패할 수 있어,
+**llama.cpp의 model loader에서 gguf key lookup에 fallback을 추가하는 패치**를 적용했습니다.
+### What was patched
+`src/llama-model-loader.cpp`에서 `gguf_find_key()` lookup에 다음 fallback을 수행하도록 수정:
+- key가 `exaone.attention.layer_norm_epsilon`이고 찾지 못하면 → `exaone.attention.layer_norm_rms_epsilon`로 재시도
+- key가 `exaone.attention.layer_norm_rms_epsilon`이고 찾지 못하면 → `exaone.attention.layer_norm_epsilon`로 재시도
+이 패치를 통해 **EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B** 계열을 동일 파이프라인으로 GGUF+quantize할 수 있습니다.
+### Patch note (minimal diff summary)
+- Added a fallback wrapper/hook for `gguf_find_key()` inside `llama-model-loader.cpp`
+- Ensured all lookups in that translation unit route through the fallback
+This repo includes:
+- `exaone-gguf-fallback.patch`
+### Tested llama.cpp commit
+- `021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2`
+---
+## Build (CUDA)
+```bash
+git clone https://github.com/ggml-org/llama.cpp
+cd llama.cpp
+git apply ../exaone-gguf-fallback.patch
+rm -rf build
+cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
+cmake --build build -j
+```
+## Convert / Quantize
+```bash
+# Convert HF snapshot -> GGUF(F16)
+python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf
+# Quantize (example: Q4_K_M)
+llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M
+```

exaone-gguf-fallback.patch ADDED Viewed

	@@ -0,0 +1,31 @@

+diff --git a/src/llama-model-loader.cpp b/src/llama-model-loader.cpp
+index bd9e6da88..5a120d69a 100644
+--- a/src/llama-model-loader.cpp
++++ b/src/llama-model-loader.cpp
+@@ -1,4 +1,26 @@
+ #include "llama-model-loader.h"
++#include <cstring>
++
++// EXAONE_GGUF_FIND_KEY_MACRO_HOOK
++// Hook gguf_find_key() inside this translation unit to handle EPS/RMS key mismatch for EXAONE.
++static auto gguf_find_key_orig = &gguf_find_key;
++
++static int gguf_find_key_exaone_fallback(const gguf_context * ctx, const char * key) {
++    int kid = gguf_find_key_orig(ctx, key);
++    if (kid >= 0) return kid;
++
++    if (strcmp(key, "exaone.attention.layer_norm_epsilon") == 0) {
++        return gguf_find_key_orig(ctx, "exaone.attention.layer_norm_rms_epsilon");
++    }
++    if (strcmp(key, "exaone.attention.layer_norm_rms_epsilon") == 0) {
++        return gguf_find_key_orig(ctx, "exaone.attention.layer_norm_epsilon");
++    }
++    return -1;
++}
++
++// Redirect all gguf_find_key calls in this file
++#define gguf_find_key(ctx, key) gguf_find_key_exaone_fallback((ctx), (key))
++
+ #include "ggml.h"