Calvin806 commited on
Commit
8c37b57
ยท
verified ยท
1 Parent(s): 02bca4b

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ EXAONE-Deep-32B.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ EXAONE-Deep-32B.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ EXAONE-Deep-32B.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
EXAONE-Deep-32B.Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43182334408e49621f2ececf3a60795b9d598c18ac45c53222bccd7508cc938e
3
+ size 19343748224
EXAONE-Deep-32B.Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9fdb6a931c39e6ac8ea458e13d30632231b86e5720f1c4d2c43728fa6c5f7be
3
+ size 22696569984
EXAONE-Deep-32B.Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17142cb458297dfae87db197a13db51b22a8b7a4415ab82af68664ef9dacbd1a
3
+ size 34009558144
README.md CHANGED
@@ -1,5 +1,63 @@
1
- ---
2
- license: other
3
- license_name: exaone
4
- license_link: https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B/blob/main/LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Calvin806/EXAONE-Deep-32B-GGUF
2
+
3
+ GGUF quantizations for **EXAONE-Deep-32B**.
4
+
5
+ ## Contents
6
+ This folder typically contains:
7
+ - `EXAONE-Deep-32B.F16.gguf`
8
+ - `EXAONE-Deep-32B.Q4_K_M.gguf`
9
+ - `EXAONE-Deep-32B.Q5_K_M.gguf`
10
+ - `EXAONE-Deep-32B.Q8_0.gguf` (optional)
11
+
12
+ ---
13
+
14
+ ## ๐Ÿ”ง llama.cpp patch (EXAONE GGUF quantize compatibility)
15
+
16
+ EXAONE GGUF ๋ณ€ํ™˜/์–‘์žํ™” ๊ณผ์ •์—์„œ ์ผ๋ถ€ ๋ชจ๋ธ(์˜ˆ: **2.4B / 7.8B**) ๊ฐ„ **KV key ๋„ค์ด๋ฐ ๋ถˆ์ผ์น˜**๊ฐ€ ๋ฐœ๊ฒฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
17
+
18
+ - ์–ด๋–ค GGUF๋Š” `exaone.attention.layer_norm_epsilon`๋งŒ ์กด์žฌ
19
+ - ์–ด๋–ค GGUF๋Š” `exaone.attention.layer_norm_rms_epsilon`๋งŒ ์กด์žฌ
20
+
21
+ ์ด ์ƒํƒœ์—์„œ vanilla llama.cpp์˜ `llama-quantize`๊ฐ€ ํŠน์ • ํ‚ค๋ฅผ ์ฐพ์ง€ ๋ชปํ•ด ์‹คํŒจํ•  ์ˆ˜ ์žˆ์–ด,
22
+ **llama.cpp์˜ model loader์—์„œ gguf key lookup์— fallback์„ ์ถ”๊ฐ€ํ•˜๋Š” ํŒจ์น˜**๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
23
+
24
+ ### What was patched
25
+
26
+ `src/llama-model-loader.cpp`์—์„œ `gguf_find_key()` lookup์— ๋‹ค์Œ fallback์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ˆ˜์ •:
27
+
28
+ - key๊ฐ€ `exaone.attention.layer_norm_epsilon`์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ `exaone.attention.layer_norm_rms_epsilon`๋กœ ์žฌ์‹œ๋„
29
+ - key๊ฐ€ `exaone.attention.layer_norm_rms_epsilon`์ด๊ณ  ์ฐพ์ง€ ๋ชปํ•˜๋ฉด โ†’ `exaone.attention.layer_norm_epsilon`๋กœ ์žฌ์‹œ๋„
30
+
31
+ ์ด ํŒจ์น˜๋ฅผ ํ†ตํ•ด **EXAONE 3.5 / EXAONE-Deep 2.4B, 7.8B, 32B** ๊ณ„์—ด์„ ๋™์ผ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ GGUF+quantizeํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
32
+
33
+ ### Patch note (minimal diff summary)
34
+ - Added a fallback wrapper/hook for `gguf_find_key()` inside `llama-model-loader.cpp`
35
+ - Ensured all lookups in that translation unit route through the fallback
36
+
37
+ This repo includes:
38
+ - `exaone-gguf-fallback.patch`
39
+
40
+ ### Tested llama.cpp commit
41
+ - `021cc28bef4dd7d0bf9c91dbbd0803caa6cb15f2`
42
+
43
+ ---
44
+
45
+ ## Build (CUDA)
46
+ ```bash
47
+ git clone https://github.com/ggml-org/llama.cpp
48
+ cd llama.cpp
49
+ git apply ../exaone-gguf-fallback.patch
50
+
51
+ rm -rf build
52
+ cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
53
+ cmake --build build -j
54
+ ```
55
+
56
+ ## Convert / Quantize
57
+ ```bash
58
+ # Convert HF snapshot -> GGUF(F16)
59
+ python3 llama.cpp/convert_hf_to_gguf.py <LOCAL_SNAPSHOT_DIR> --outtype f16 --outfile model.F16.gguf
60
+
61
+ # Quantize (example: Q4_K_M)
62
+ llama.cpp/build/bin/llama-quantize model.F16.gguf model.Q4_K_M.gguf Q4_K_M
63
+ ```
exaone-gguf-fallback.patch ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ diff --git a/src/llama-model-loader.cpp b/src/llama-model-loader.cpp
2
+ index bd9e6da88..5a120d69a 100644
3
+ --- a/src/llama-model-loader.cpp
4
+ +++ b/src/llama-model-loader.cpp
5
+ @@ -1,4 +1,26 @@
6
+ #include "llama-model-loader.h"
7
+ +#include <cstring>
8
+ +
9
+ +// EXAONE_GGUF_FIND_KEY_MACRO_HOOK
10
+ +// Hook gguf_find_key() inside this translation unit to handle EPS/RMS key mismatch for EXAONE.
11
+ +static auto gguf_find_key_orig = &gguf_find_key;
12
+ +
13
+ +static int gguf_find_key_exaone_fallback(const gguf_context * ctx, const char * key) {
14
+ + int kid = gguf_find_key_orig(ctx, key);
15
+ + if (kid >= 0) return kid;
16
+ +
17
+ + if (strcmp(key, "exaone.attention.layer_norm_epsilon") == 0) {
18
+ + return gguf_find_key_orig(ctx, "exaone.attention.layer_norm_rms_epsilon");
19
+ + }
20
+ + if (strcmp(key, "exaone.attention.layer_norm_rms_epsilon") == 0) {
21
+ + return gguf_find_key_orig(ctx, "exaone.attention.layer_norm_epsilon");
22
+ + }
23
+ + return -1;
24
+ +}
25
+ +
26
+ +// Redirect all gguf_find_key calls in this file
27
+ +#define gguf_find_key(ctx, key) gguf_find_key_exaone_fallback((ctx), (key))
28
+ +
29
+
30
+ #include "ggml.h"
31
+