chux0519 commited on
Commit
d4e1bcd
·
verified ·
1 Parent(s): 295b8db

Upload optimized GGUF for embeddings.cpp

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ snowflake-arctic-embed-m-v2.0.q4_k_mlp_q8_attn.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Snowflake/snowflake-arctic-embed-m-v2.0
3
+ library_name: embeddings.cpp
4
+ tags:
5
+ - gguf
6
+ - embeddings
7
+ - embeddings.cpp
8
+ - snowflake
9
+ - gte
10
+ - cpu
11
+ pipeline_tag: sentence-similarity
12
+ ---
13
+
14
+ # Snowflake/snowflake-arctic-embed-m-v2.0 GGUF for embeddings.cpp
15
+
16
+ This repository contains an optimized GGUF artifact for running
17
+ `Snowflake/snowflake-arctic-embed-m-v2.0` with
18
+ [`embeddings.cpp`](https://github.com/daandtu/embeddings.cpp).
19
+
20
+ The GGUF is intended for embedding inference. It is not a llama.cpp
21
+ text-generation model.
22
+
23
+ ## File
24
+
25
+ | File | Quantization | Size | SHA256 |
26
+ |---|---|---:|---|
27
+ | `snowflake-arctic-embed-m-v2.0.q4_k_mlp_q8_attn.gguf` | mixed `q4_K` MLP + `q8_0` attention | `186.26 MB` | `4fa3b1f7f11d929137cafdd12aac01e6f8d6ee9f6f41853521e43feb7a7f4414` |
28
+
29
+ The mixed quantization policy is:
30
+
31
+ - `mlp.up_gate_proj.weight`: `q4_K`
32
+ - `mlp.down_proj.weight`: `q4_K`
33
+ - `attention.qkv_proj.weight`: `q8_0`
34
+ - `attention.o_proj.weight`: `q8_0`
35
+
36
+ ## Recommended embeddings.cpp Build
37
+
38
+ ```bash
39
+ cmake -S . -B build \
40
+ -DCMAKE_BUILD_TYPE=Release \
41
+ -DEMBEDDINGS_CPP_ENABLE_PYBIND=ON \
42
+ -DGGML_CPU_REPACK=ON \
43
+ -DGGML_BLAS=OFF \
44
+ -DGGML_OPENMP=OFF \
45
+ -DGGML_NATIVE=OFF \
46
+ -DGGML_CUDA=OFF \
47
+ -DGGML_VULKAN=OFF \
48
+ -DGGML_METAL=OFF
49
+
50
+ cmake --build build -j "$(nproc)"
51
+ ```
52
+
53
+ ## Recommended CPU Runtime
54
+
55
+ ```bash
56
+ EMBEDDINGS_CPP_CPU_REPACK=1 \
57
+ EMBEDDINGS_CPP_FLASH_ATTN=1 \
58
+ python your_script.py
59
+ ```
60
+
61
+ By default, `embeddings.cpp` uses the detected CPU concurrency for model
62
+ inference. Set `EMBEDDINGS_CPP_THREADS=N` only when pinning a deployment to a
63
+ measured value for a specific CPU quota or host.
64
+
65
+ Do not enable the experimental `GGML_REPACK_Q8_AVX2=1` path for this artifact;
66
+ it was slower on the tuning host.
67
+
68
+ ## Reproducing The GGUF
69
+
70
+ From an `embeddings.cpp` checkout:
71
+
72
+ ```bash
73
+ uv pip install -r scripts/requirements.txt
74
+
75
+ uv run scripts/convert.py \
76
+ Snowflake/snowflake-arctic-embed-m-v2.0 \
77
+ models/snowflake-arctic-embed-m-v2.0.fp16.gguf \
78
+ f16
79
+
80
+ EMBEDDINGS_CPP_SKIP_QUANT_PATTERNS='attention.qkv_proj.weight,attention.o_proj.weight' \
81
+ ./build/quantize \
82
+ models/snowflake-arctic-embed-m-v2.0.fp16.gguf \
83
+ models/snowflake-arctic-embed-m-v2.0.q4_k_mlp_attnf16.gguf \
84
+ q4_k
85
+
86
+ EMBEDDINGS_CPP_SKIP_QUANT_PATTERNS='mlp.up_gate_proj.weight,mlp.down_proj.weight' \
87
+ ./build/quantize \
88
+ models/snowflake-arctic-embed-m-v2.0.q4_k_mlp_attnf16.gguf \
89
+ models/snowflake-arctic-embed-m-v2.0.q4_k_mlp_q8_attn.gguf \
90
+ q8_0
91
+ ```
92
+
93
+ ## Notes
94
+
95
+ This model is derived from `Snowflake/snowflake-arctic-embed-m-v2.0`. Use the
96
+ upstream model card and license terms when deciding whether this artifact is
97
+ appropriate for your use case.
snowflake-arctic-embed-m-v2.0.q4_k_mlp_q8_attn.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fa3b1f7f11d929137cafdd12aac01e6f8d6ee9f6f41853521e43feb7a7f4414
3
+ size 195308576