| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-0.6B-Base |
| library_name: llama.cpp |
| pipeline_tag: feature-extraction |
| model_type: embedding |
| quantization: |
| - Q4_K_M |
| tags: |
| - embeddings |
| - qwen |
| - gguf |
| - tokenizer |
| language: |
| - en |
| inference: false |
| --- |
| # Qwen3-Embedding-0.6B (GGUF) Models |
|
|
| This directory contains GGUF builds of the Qwen3 0.6B embedding model, produced from the upstream base repository `Qwen/Qwen3-0.6B-Base` (original Hugging Face layout in `../Qwen3-Embedding-0.6B/`). |
|
|
| ## Contents |
|
|
| | File | Purpose | |
| | ---------------------------------- | ---------------------------------------------------------------- | |
| | `qwen3-embedding-0.6b.Q4_K_M.gguf` | Quantized (Q4_K_M) GGUF for efficient inference. | |
| | `qwen3-embedding-0.6b-fix.gguf` | Same model with explicit `sep_token` / EOS metadata fix applied. | |
|
|
| ## Special Token Configuration |
|
|
| Extracted from `tokenizer_config.json`: |
|
|
| ```jsonc |
| "sep_token": "<|endoftext|>", |
| "sep_token_id": 151643 |
| ``` |
|
|
| The model uses `<|endoftext|>` as both the padding (`pad_token`) and separator (`sep_token`). For embedding generation each input text MUST terminate with the separator token (or the converter must auto-append it) to avoid a runtime warning: |
|
|
| ```text |
| [WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header |
| ``` |
|
|
| ### Why the Warning Appears |
|
|
| If the GGUF metadata key `tokenizer.ggml.add_eos_token` is absent or `false`, llama.cpp will not auto-append the final SEP/EOS token for embedding inputs. Any input string that does not already end with `<|endoftext|>` triggers the warning and may yield sub‑optimal embeddings (slightly different token boundary semantics). |
|
|
| ### Fix Implemented |
|
|
| The file `qwen3-embedding-0.6b-fix.gguf` was regenerated ensuring: |
|
|
| - `tokenizer.ggml.add_eos_token = true` |
| - `sep_token` (`<|endoftext|>`) retained with id `151643` |
|
|
| This makes llama.cpp automatically append the SEP/EOS token when missing, silencing the warning and standardizing embeddings. |
|
|
| ## Rebuilding From Upstream (Recommended Process) |
|
|
| 1. Obtain upstream model: |
| - Clone or download `Qwen/Qwen3-0.6B-Base` (embedding variant directory). |
| 2. Convert to GGUF using the current `llama.cpp` conversion script: |
| - Use the repo's `convert_hf_to_gguf.py` (it already sets EOS for Qwen tokenizers). Example: |
|
|
| ```bash |
| python3 llama.cpp/convert_hf_to_gguf.py \ |
| --model Qwen3-Embedding-0.6B \ |
| --outfile qwen3-embedding-0.6b-fix.gguf \ |
| --ftype q4_k_m |
| ``` |
|
|
| > If you previously produced a GGUF that shows the warning, just re-run conversion with an up-to-date `llama.cpp` checkout. The script internally writes `tokenizer.ggml.add_eos_token = true` for this tokenizer family. |
|
|
| ### Post-Conversion Validation |
|
|
| Run a quick embedding call and confirm no warning appears: |
|
|
| ```bash |
| ./llama.cpp/build/bin/embedding \ |
| -m models/qwen3-embedding-0.6b-fix.gguf \ |
| -p "Hello world" |
| ``` |
|
|
| If you still see the warning: |
|
|
| - Confirm the binary was rebuilt after updating sources (`make` or `cmake --build`). |
| - Inspect metadata using a small Python snippet: |
|
|
| ```python |
| from gguf import GGUFReader |
| r = GGUFReader("models/qwen3-embedding-0.6b-fix.gguf") |
| for f in r.fields: |
| if f.name == "tokenizer.ggml.add_eos_token": |
| print("ADD_EOS_TOKEN=", f.parts[-1]) |
| ``` |
|
|
| Expected output: `ADD_EOS_TOKEN= True` |
|
|
| ## Manual Patch (Fallback Method) |
|
|
| If re-conversion is inconvenient, you can clone metadata and force the flag: |
|
|
| ```python |
| from gguf import GGUFReader, GGUFWriter, constants as C |
| src = GGUFReader("qwen3-embedding-0.6b.Q4_K_M.gguf") |
| dst = GGUFWriter("qwen3-embedding-0.6b-fix.gguf", src.architecture) |
| |
| # Copy all existing fields except override ADD_EOS |
| for field in src.fields: |
| if field.name == C.Keys.Tokenizer.ADD_EOS: |
| continue |
| dst.add_field(field.name, field.field_type, field.parts) |
| |
| dst.add_add_eos_token(True) # set flag |
| |
| # Copy tensors |
| for tensor in src.tensors: |
| data = tensor.data() |
| dst.add_tensor(tensor.name, data, tensor.shape, tensor.tensor_type) |
| |
| dst.write_header_to_file() |
| dst.write_kv_data_to_file() |
| dst.write_tensors_to_file() |
| dst.close() |
| ``` |
|
|
| After patching, re-run the validation step. |
|
|
| ## Usage Notes for Embeddings |
|
|
| - Always feed raw text; no special wrapping needed. Auto-SEP happens with the fixed file. |
| - For batch embeddings, ensure each string ends cleanly (avoid trailing spaces if you rely on identical hashes downstream). |
| - The dimensionality matches upstream Qwen3-Embedding-0.6B (refer to upstream docs for exact embedding size). |
|
|
| ## License & Attribution |
|
|
| The original model weights and tokenizer come from the Qwen project (`Qwen/Qwen3-0.6B-Base`). Review their license and usage terms before redistribution. This README documents conversion adjustments only (metadata EOS flag addition). |
|
|
| ## Changelog |
|
|
| - Initial addition: added fixed GGUF with `tokenizer.ggml.add_eos_token = true` to suppress SEP warning. |
|
|
| --- |
|
|
| For further improvements (FP16 build, alternative quantization tiers, or batching examples), open an issue or PR in this repo. |