WilliamSong commited on
Commit
38d066a
·
verified ·
1 Parent(s): b86a8b6

Upload folder using huggingface_hub

Browse files
.DS_Store ADDED
Binary file (6.15 kB). View file
 
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ qwen3-embedding-0.6b-fix.gguf filter=lfs diff=lfs merge=lfs -text
37
+ qwen3-embedding-0.6b.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen3-Embedding-0.6B (GGUF) Models
2
+
3
+ This directory contains GGUF builds of the Qwen3 0.6B embedding model, produced from the upstream base repository `Qwen/Qwen3-0.6B-Base` (original Hugging Face layout in `../Qwen3-Embedding-0.6B/`).
4
+
5
+ ## Contents
6
+
7
+ | File | Purpose |
8
+ | ---------------------------------- | ---------------------------------------------------------------- |
9
+ | `qwen3-embedding-0.6b.Q4_K_M.gguf` | Quantized (Q4_K_M) GGUF for efficient inference. |
10
+ | `qwen3-embedding-0.6b-fix.gguf` | Same model with explicit `sep_token` / EOS metadata fix applied. |
11
+
12
+ ## Special Token Configuration
13
+
14
+ Extracted from `tokenizer_config.json`:
15
+
16
+ ```jsonc
17
+ "sep_token": "<|endoftext|>",
18
+ "sep_token_id": 151643
19
+ ```
20
+
21
+ The model uses `<|endoftext|>` as both the padding (`pad_token`) and separator (`sep_token`). For embedding generation each input text MUST terminate with the separator token (or the converter must auto-append it) to avoid a runtime warning:
22
+
23
+ ```text
24
+ [WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
25
+ ```
26
+
27
+ ### Why the Warning Appears
28
+
29
+ If the GGUF metadata key `tokenizer.ggml.add_eos_token` is absent or `false`, llama.cpp will not auto-append the final SEP/EOS token for embedding inputs. Any input string that does not already end with `<|endoftext|>` triggers the warning and may yield sub‑optimal embeddings (slightly different token boundary semantics).
30
+
31
+ ### Fix Implemented
32
+
33
+ The file `qwen3-embedding-0.6b-fix.gguf` was regenerated ensuring:
34
+
35
+ - `tokenizer.ggml.add_eos_token = true`
36
+ - `sep_token` (`<|endoftext|>`) retained with id `151643`
37
+
38
+ This makes llama.cpp automatically append the SEP/EOS token when missing, silencing the warning and standardizing embeddings.
39
+
40
+ ## Rebuilding From Upstream (Recommended Process)
41
+
42
+ 1. Obtain upstream model:
43
+ - Clone or download `Qwen/Qwen3-0.6B-Base` (embedding variant directory).
44
+ 2. Convert to GGUF using the current `llama.cpp` conversion script:
45
+ - Use the repo's `convert_hf_to_gguf.py` (it already sets EOS for Qwen tokenizers). Example:
46
+
47
+ ```bash
48
+ python3 llama.cpp/convert_hf_to_gguf.py \
49
+ --model Qwen3-Embedding-0.6B \
50
+ --outfile qwen3-embedding-0.6b-fix.gguf \
51
+ --ftype q4_k_m
52
+ ```
53
+
54
+ > If you previously produced a GGUF that shows the warning, just re-run conversion with an up-to-date `llama.cpp` checkout. The script internally writes `tokenizer.ggml.add_eos_token = true` for this tokenizer family.
55
+
56
+ ### Post-Conversion Validation
57
+
58
+ Run a quick embedding call and confirm no warning appears:
59
+
60
+ ```bash
61
+ ./llama.cpp/build/bin/embedding \
62
+ -m models/qwen3-embedding-0.6b-fix.gguf \
63
+ -p "Hello world"
64
+ ```
65
+
66
+ If you still see the warning:
67
+
68
+ - Confirm the binary was rebuilt after updating sources (`make` or `cmake --build`).
69
+ - Inspect metadata using a small Python snippet:
70
+
71
+ ```python
72
+ from gguf import GGUFReader
73
+ r = GGUFReader("models/qwen3-embedding-0.6b-fix.gguf")
74
+ for f in r.fields:
75
+ if f.name == "tokenizer.ggml.add_eos_token":
76
+ print("ADD_EOS_TOKEN=", f.parts[-1])
77
+ ```
78
+
79
+ Expected output: `ADD_EOS_TOKEN= True`
80
+
81
+ ## Manual Patch (Fallback Method)
82
+
83
+ If re-conversion is inconvenient, you can clone metadata and force the flag:
84
+
85
+ ```python
86
+ from gguf import GGUFReader, GGUFWriter, constants as C
87
+ src = GGUFReader("qwen3-embedding-0.6b.Q4_K_M.gguf")
88
+ dst = GGUFWriter("qwen3-embedding-0.6b-fix.gguf", src.architecture)
89
+
90
+ # Copy all existing fields except override ADD_EOS
91
+ for field in src.fields:
92
+ if field.name == C.Keys.Tokenizer.ADD_EOS:
93
+ continue
94
+ dst.add_field(field.name, field.field_type, field.parts)
95
+
96
+ dst.add_add_eos_token(True) # set flag
97
+
98
+ # Copy tensors
99
+ for tensor in src.tensors:
100
+ data = tensor.data()
101
+ dst.add_tensor(tensor.name, data, tensor.shape, tensor.tensor_type)
102
+
103
+ dst.write_header_to_file()
104
+ dst.write_kv_data_to_file()
105
+ dst.write_tensors_to_file()
106
+ dst.close()
107
+ ```
108
+
109
+ After patching, re-run the validation step.
110
+
111
+ ## Usage Notes for Embeddings
112
+
113
+ - Always feed raw text; no special wrapping needed. Auto-SEP happens with the fixed file.
114
+ - For batch embeddings, ensure each string ends cleanly (avoid trailing spaces if you rely on identical hashes downstream).
115
+ - The dimensionality matches upstream Qwen3-Embedding-0.6B (refer to upstream docs for exact embedding size).
116
+
117
+ ## License & Attribution
118
+
119
+ The original model weights and tokenizer come from the Qwen project (`Qwen/Qwen3-0.6B-Base`). Review their license and usage terms before redistribution. This README documents conversion adjustments only (metadata EOS flag addition).
120
+
121
+ ## Changelog
122
+
123
+ - Initial addition: added fixed GGUF with `tokenizer.ggml.add_eos_token = true` to suppress SEP warning.
124
+
125
+ ---
126
+
127
+ For further improvements (FP16 build, alternative quantization tiers, or batching examples), open an issue or PR in this repo.
qwen3-embedding-0.6b-fix.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9115e21a00b13479bdd40565848e0927d305c666647f511bb43d76e50bef4f02
3
+ size 1197629696
qwen3-embedding-0.6b.Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:989c1dc01d8404d3eda2bbfb0a6ae2890869f6677ee74067f3e60ae9eb1c95b4
3
+ size 396474624