Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

.DS_Store +0 -0
.gitattributes +2 -0
README.md +127 -0
qwen3-embedding-0.6b-fix.gguf +3 -0
qwen3-embedding-0.6b.Q4_K_M.gguf +3 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+qwen3-embedding-0.6b-fix.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-embedding-0.6b.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,127 @@

+# Qwen3-Embedding-0.6B (GGUF) Models
+This directory contains GGUF builds of the Qwen3 0.6B embedding model, produced from the upstream base repository `Qwen/Qwen3-0.6B-Base` (original Hugging Face layout in `../Qwen3-Embedding-0.6B/`).
+## Contents
+| File                               | Purpose                                                          |
+| ---------------------------------- | ---------------------------------------------------------------- |
+| `qwen3-embedding-0.6b.Q4_K_M.gguf` | Quantized (Q4_K_M) GGUF for efficient inference.                 |
+| `qwen3-embedding-0.6b-fix.gguf`    | Same model with explicit `sep_token` / EOS metadata fix applied. |
+## Special Token Configuration
+Extracted from `tokenizer_config.json`:
+```jsonc
+"sep_token": "<|endoftext|>",
+"sep_token_id": 151643
+```
+The model uses `<|endoftext|>` as both the padding (`pad_token`) and separator (`sep_token`). For embedding generation each input text MUST terminate with the separator token (or the converter must auto-append it) to avoid a runtime warning:
+```text
+[WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
+```
+### Why the Warning Appears
+If the GGUF metadata key `tokenizer.ggml.add_eos_token` is absent or `false`, llama.cpp will not auto-append the final SEP/EOS token for embedding inputs. Any input string that does not already end with `<|endoftext|>` triggers the warning and may yield sub‑optimal embeddings (slightly different token boundary semantics).
+### Fix Implemented
+The file `qwen3-embedding-0.6b-fix.gguf` was regenerated ensuring:
+- `tokenizer.ggml.add_eos_token = true`
+- `sep_token` (`<|endoftext|>`) retained with id `151643`
+This makes llama.cpp automatically append the SEP/EOS token when missing, silencing the warning and standardizing embeddings.
+## Rebuilding From Upstream (Recommended Process)
+1. Obtain upstream model:
+   - Clone or download `Qwen/Qwen3-0.6B-Base` (embedding variant directory).
+2. Convert to GGUF using the current `llama.cpp` conversion script:
+   - Use the repo's `convert_hf_to_gguf.py` (it already sets EOS for Qwen tokenizers). Example:
+```bash
+python3 llama.cpp/convert_hf_to_gguf.py \
+  --model Qwen3-Embedding-0.6B \
+  --outfile qwen3-embedding-0.6b-fix.gguf \
+  --ftype q4_k_m
+```
+> If you previously produced a GGUF that shows the warning, just re-run conversion with an up-to-date `llama.cpp` checkout. The script internally writes `tokenizer.ggml.add_eos_token = true` for this tokenizer family.
+### Post-Conversion Validation
+Run a quick embedding call and confirm no warning appears:
+```bash
+./llama.cpp/build/bin/embedding \
+  -m models/qwen3-embedding-0.6b-fix.gguf \
+  -p "Hello world"
+```
+If you still see the warning:
+- Confirm the binary was rebuilt after updating sources (`make` or `cmake --build`).
+- Inspect metadata using a small Python snippet:
+```python
+from gguf import GGUFReader
+r = GGUFReader("models/qwen3-embedding-0.6b-fix.gguf")
+for f in r.fields:
+    if f.name == "tokenizer.ggml.add_eos_token":
+        print("ADD_EOS_TOKEN=", f.parts[-1])
+```
+Expected output: `ADD_EOS_TOKEN= True`
+## Manual Patch (Fallback Method)
+If re-conversion is inconvenient, you can clone metadata and force the flag:
+```python
+from gguf import GGUFReader, GGUFWriter, constants as C
+src = GGUFReader("qwen3-embedding-0.6b.Q4_K_M.gguf")
+dst = GGUFWriter("qwen3-embedding-0.6b-fix.gguf", src.architecture)
+# Copy all existing fields except override ADD_EOS
+for field in src.fields:
+    if field.name == C.Keys.Tokenizer.ADD_EOS:
+        continue
+    dst.add_field(field.name, field.field_type, field.parts)
+dst.add_add_eos_token(True)  # set flag
+# Copy tensors
+for tensor in src.tensors:
+    data = tensor.data()
+    dst.add_tensor(tensor.name, data, tensor.shape, tensor.tensor_type)
+dst.write_header_to_file()
+dst.write_kv_data_to_file()
+dst.write_tensors_to_file()
+dst.close()
+```
+After patching, re-run the validation step.
+## Usage Notes for Embeddings
+- Always feed raw text; no special wrapping needed. Auto-SEP happens with the fixed file.
+- For batch embeddings, ensure each string ends cleanly (avoid trailing spaces if you rely on identical hashes downstream).
+- The dimensionality matches upstream Qwen3-Embedding-0.6B (refer to upstream docs for exact embedding size).
+## License & Attribution
+The original model weights and tokenizer come from the Qwen project (`Qwen/Qwen3-0.6B-Base`). Review their license and usage terms before redistribution. This README documents conversion adjustments only (metadata EOS flag addition).
+## Changelog
+- Initial addition: added fixed GGUF with `tokenizer.ggml.add_eos_token = true` to suppress SEP warning.
+---
+For further improvements (FP16 build, alternative quantization tiers, or batching examples), open an issue or PR in this repo.

qwen3-embedding-0.6b-fix.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9115e21a00b13479bdd40565848e0927d305c666647f511bb43d76e50bef4f02
+size 1197629696

qwen3-embedding-0.6b.Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:989c1dc01d8404d3eda2bbfb0a6ae2890869f6677ee74067f3e60ae9eb1c95b4
+size 396474624