WilliamSong's picture
Add metadata
7912845 verified
---
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
library_name: llama.cpp
pipeline_tag: feature-extraction
model_type: embedding
quantization:
- Q4_K_M
tags:
- embeddings
- qwen
- gguf
- tokenizer
language:
- en
inference: false
---
# Qwen3-Embedding-0.6B (GGUF) Models
This directory contains GGUF builds of the Qwen3 0.6B embedding model, produced from the upstream base repository `Qwen/Qwen3-0.6B-Base` (original Hugging Face layout in `../Qwen3-Embedding-0.6B/`).
## Contents
| File | Purpose |
| ---------------------------------- | ---------------------------------------------------------------- |
| `qwen3-embedding-0.6b.Q4_K_M.gguf` | Quantized (Q4_K_M) GGUF for efficient inference. |
| `qwen3-embedding-0.6b-fix.gguf` | Same model with explicit `sep_token` / EOS metadata fix applied. |
## Special Token Configuration
Extracted from `tokenizer_config.json`:
```jsonc
"sep_token": "<|endoftext|>",
"sep_token_id": 151643
```
The model uses `<|endoftext|>` as both the padding (`pad_token`) and separator (`sep_token`). For embedding generation each input text MUST terminate with the separator token (or the converter must auto-append it) to avoid a runtime warning:
```text
[WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
```
### Why the Warning Appears
If the GGUF metadata key `tokenizer.ggml.add_eos_token` is absent or `false`, llama.cpp will not auto-append the final SEP/EOS token for embedding inputs. Any input string that does not already end with `<|endoftext|>` triggers the warning and may yield sub‑optimal embeddings (slightly different token boundary semantics).
### Fix Implemented
The file `qwen3-embedding-0.6b-fix.gguf` was regenerated ensuring:
- `tokenizer.ggml.add_eos_token = true`
- `sep_token` (`<|endoftext|>`) retained with id `151643`
This makes llama.cpp automatically append the SEP/EOS token when missing, silencing the warning and standardizing embeddings.
## Rebuilding From Upstream (Recommended Process)
1. Obtain upstream model:
- Clone or download `Qwen/Qwen3-0.6B-Base` (embedding variant directory).
2. Convert to GGUF using the current `llama.cpp` conversion script:
- Use the repo's `convert_hf_to_gguf.py` (it already sets EOS for Qwen tokenizers). Example:
```bash
python3 llama.cpp/convert_hf_to_gguf.py \
--model Qwen3-Embedding-0.6B \
--outfile qwen3-embedding-0.6b-fix.gguf \
--ftype q4_k_m
```
> If you previously produced a GGUF that shows the warning, just re-run conversion with an up-to-date `llama.cpp` checkout. The script internally writes `tokenizer.ggml.add_eos_token = true` for this tokenizer family.
### Post-Conversion Validation
Run a quick embedding call and confirm no warning appears:
```bash
./llama.cpp/build/bin/embedding \
-m models/qwen3-embedding-0.6b-fix.gguf \
-p "Hello world"
```
If you still see the warning:
- Confirm the binary was rebuilt after updating sources (`make` or `cmake --build`).
- Inspect metadata using a small Python snippet:
```python
from gguf import GGUFReader
r = GGUFReader("models/qwen3-embedding-0.6b-fix.gguf")
for f in r.fields:
if f.name == "tokenizer.ggml.add_eos_token":
print("ADD_EOS_TOKEN=", f.parts[-1])
```
Expected output: `ADD_EOS_TOKEN= True`
## Manual Patch (Fallback Method)
If re-conversion is inconvenient, you can clone metadata and force the flag:
```python
from gguf import GGUFReader, GGUFWriter, constants as C
src = GGUFReader("qwen3-embedding-0.6b.Q4_K_M.gguf")
dst = GGUFWriter("qwen3-embedding-0.6b-fix.gguf", src.architecture)
# Copy all existing fields except override ADD_EOS
for field in src.fields:
if field.name == C.Keys.Tokenizer.ADD_EOS:
continue
dst.add_field(field.name, field.field_type, field.parts)
dst.add_add_eos_token(True) # set flag
# Copy tensors
for tensor in src.tensors:
data = tensor.data()
dst.add_tensor(tensor.name, data, tensor.shape, tensor.tensor_type)
dst.write_header_to_file()
dst.write_kv_data_to_file()
dst.write_tensors_to_file()
dst.close()
```
After patching, re-run the validation step.
## Usage Notes for Embeddings
- Always feed raw text; no special wrapping needed. Auto-SEP happens with the fixed file.
- For batch embeddings, ensure each string ends cleanly (avoid trailing spaces if you rely on identical hashes downstream).
- The dimensionality matches upstream Qwen3-Embedding-0.6B (refer to upstream docs for exact embedding size).
## License & Attribution
The original model weights and tokenizer come from the Qwen project (`Qwen/Qwen3-0.6B-Base`). Review their license and usage terms before redistribution. This README documents conversion adjustments only (metadata EOS flag addition).
## Changelog
- Initial addition: added fixed GGUF with `tokenizer.ggml.add_eos_token = true` to suppress SEP warning.
---
For further improvements (FP16 build, alternative quantization tiers, or batching examples), open an issue or PR in this repo.