Upload folder using huggingface_hub
Browse files- .DS_Store +0 -0
- .gitattributes +2 -0
- README.md +127 -0
- qwen3-embedding-0.6b-fix.gguf +3 -0
- qwen3-embedding-0.6b.Q4_K_M.gguf +3 -0
.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
qwen3-embedding-0.6b-fix.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
qwen3-embedding-0.6b.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Qwen3-Embedding-0.6B (GGUF) Models
|
| 2 |
+
|
| 3 |
+
This directory contains GGUF builds of the Qwen3 0.6B embedding model, produced from the upstream base repository `Qwen/Qwen3-0.6B-Base` (original Hugging Face layout in `../Qwen3-Embedding-0.6B/`).
|
| 4 |
+
|
| 5 |
+
## Contents
|
| 6 |
+
|
| 7 |
+
| File | Purpose |
|
| 8 |
+
| ---------------------------------- | ---------------------------------------------------------------- |
|
| 9 |
+
| `qwen3-embedding-0.6b.Q4_K_M.gguf` | Quantized (Q4_K_M) GGUF for efficient inference. |
|
| 10 |
+
| `qwen3-embedding-0.6b-fix.gguf` | Same model with explicit `sep_token` / EOS metadata fix applied. |
|
| 11 |
+
|
| 12 |
+
## Special Token Configuration
|
| 13 |
+
|
| 14 |
+
Extracted from `tokenizer_config.json`:
|
| 15 |
+
|
| 16 |
+
```jsonc
|
| 17 |
+
"sep_token": "<|endoftext|>",
|
| 18 |
+
"sep_token_id": 151643
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
The model uses `<|endoftext|>` as both the padding (`pad_token`) and separator (`sep_token`). For embedding generation each input text MUST terminate with the separator token (or the converter must auto-append it) to avoid a runtime warning:
|
| 22 |
+
|
| 23 |
+
```text
|
| 24 |
+
[WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
### Why the Warning Appears
|
| 28 |
+
|
| 29 |
+
If the GGUF metadata key `tokenizer.ggml.add_eos_token` is absent or `false`, llama.cpp will not auto-append the final SEP/EOS token for embedding inputs. Any input string that does not already end with `<|endoftext|>` triggers the warning and may yield sub‑optimal embeddings (slightly different token boundary semantics).
|
| 30 |
+
|
| 31 |
+
### Fix Implemented
|
| 32 |
+
|
| 33 |
+
The file `qwen3-embedding-0.6b-fix.gguf` was regenerated ensuring:
|
| 34 |
+
|
| 35 |
+
- `tokenizer.ggml.add_eos_token = true`
|
| 36 |
+
- `sep_token` (`<|endoftext|>`) retained with id `151643`
|
| 37 |
+
|
| 38 |
+
This makes llama.cpp automatically append the SEP/EOS token when missing, silencing the warning and standardizing embeddings.
|
| 39 |
+
|
| 40 |
+
## Rebuilding From Upstream (Recommended Process)
|
| 41 |
+
|
| 42 |
+
1. Obtain upstream model:
|
| 43 |
+
- Clone or download `Qwen/Qwen3-0.6B-Base` (embedding variant directory).
|
| 44 |
+
2. Convert to GGUF using the current `llama.cpp` conversion script:
|
| 45 |
+
- Use the repo's `convert_hf_to_gguf.py` (it already sets EOS for Qwen tokenizers). Example:
|
| 46 |
+
|
| 47 |
+
```bash
|
| 48 |
+
python3 llama.cpp/convert_hf_to_gguf.py \
|
| 49 |
+
--model Qwen3-Embedding-0.6B \
|
| 50 |
+
--outfile qwen3-embedding-0.6b-fix.gguf \
|
| 51 |
+
--ftype q4_k_m
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
> If you previously produced a GGUF that shows the warning, just re-run conversion with an up-to-date `llama.cpp` checkout. The script internally writes `tokenizer.ggml.add_eos_token = true` for this tokenizer family.
|
| 55 |
+
|
| 56 |
+
### Post-Conversion Validation
|
| 57 |
+
|
| 58 |
+
Run a quick embedding call and confirm no warning appears:
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
./llama.cpp/build/bin/embedding \
|
| 62 |
+
-m models/qwen3-embedding-0.6b-fix.gguf \
|
| 63 |
+
-p "Hello world"
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
If you still see the warning:
|
| 67 |
+
|
| 68 |
+
- Confirm the binary was rebuilt after updating sources (`make` or `cmake --build`).
|
| 69 |
+
- Inspect metadata using a small Python snippet:
|
| 70 |
+
|
| 71 |
+
```python
|
| 72 |
+
from gguf import GGUFReader
|
| 73 |
+
r = GGUFReader("models/qwen3-embedding-0.6b-fix.gguf")
|
| 74 |
+
for f in r.fields:
|
| 75 |
+
if f.name == "tokenizer.ggml.add_eos_token":
|
| 76 |
+
print("ADD_EOS_TOKEN=", f.parts[-1])
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
Expected output: `ADD_EOS_TOKEN= True`
|
| 80 |
+
|
| 81 |
+
## Manual Patch (Fallback Method)
|
| 82 |
+
|
| 83 |
+
If re-conversion is inconvenient, you can clone metadata and force the flag:
|
| 84 |
+
|
| 85 |
+
```python
|
| 86 |
+
from gguf import GGUFReader, GGUFWriter, constants as C
|
| 87 |
+
src = GGUFReader("qwen3-embedding-0.6b.Q4_K_M.gguf")
|
| 88 |
+
dst = GGUFWriter("qwen3-embedding-0.6b-fix.gguf", src.architecture)
|
| 89 |
+
|
| 90 |
+
# Copy all existing fields except override ADD_EOS
|
| 91 |
+
for field in src.fields:
|
| 92 |
+
if field.name == C.Keys.Tokenizer.ADD_EOS:
|
| 93 |
+
continue
|
| 94 |
+
dst.add_field(field.name, field.field_type, field.parts)
|
| 95 |
+
|
| 96 |
+
dst.add_add_eos_token(True) # set flag
|
| 97 |
+
|
| 98 |
+
# Copy tensors
|
| 99 |
+
for tensor in src.tensors:
|
| 100 |
+
data = tensor.data()
|
| 101 |
+
dst.add_tensor(tensor.name, data, tensor.shape, tensor.tensor_type)
|
| 102 |
+
|
| 103 |
+
dst.write_header_to_file()
|
| 104 |
+
dst.write_kv_data_to_file()
|
| 105 |
+
dst.write_tensors_to_file()
|
| 106 |
+
dst.close()
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
After patching, re-run the validation step.
|
| 110 |
+
|
| 111 |
+
## Usage Notes for Embeddings
|
| 112 |
+
|
| 113 |
+
- Always feed raw text; no special wrapping needed. Auto-SEP happens with the fixed file.
|
| 114 |
+
- For batch embeddings, ensure each string ends cleanly (avoid trailing spaces if you rely on identical hashes downstream).
|
| 115 |
+
- The dimensionality matches upstream Qwen3-Embedding-0.6B (refer to upstream docs for exact embedding size).
|
| 116 |
+
|
| 117 |
+
## License & Attribution
|
| 118 |
+
|
| 119 |
+
The original model weights and tokenizer come from the Qwen project (`Qwen/Qwen3-0.6B-Base`). Review their license and usage terms before redistribution. This README documents conversion adjustments only (metadata EOS flag addition).
|
| 120 |
+
|
| 121 |
+
## Changelog
|
| 122 |
+
|
| 123 |
+
- Initial addition: added fixed GGUF with `tokenizer.ggml.add_eos_token = true` to suppress SEP warning.
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
For further improvements (FP16 build, alternative quantization tiers, or batching examples), open an issue or PR in this repo.
|
qwen3-embedding-0.6b-fix.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9115e21a00b13479bdd40565848e0927d305c666647f511bb43d76e50bef4f02
|
| 3 |
+
size 1197629696
|
qwen3-embedding-0.6b.Q4_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:989c1dc01d8404d3eda2bbfb0a6ae2890869f6677ee74067f3e60ae9eb1c95b4
|
| 3 |
+
size 396474624
|