| --- |
| license: apache-2.0 |
| tags: |
| - sentence-similarity |
| - feature-extraction |
| - static-embeddings |
| - lf4-quantization |
| - retrieval |
| - rag |
| model_name: Vortex-Embed v3 |
| metrics: |
| - spearman |
| --- |
| |
| # Vortex-Embed v3 β Sentence-Similarity for RAG |
|
|
| **Retrieval-optimized 4-bit static embeddings for sentence-similarity and RAG.** |
|
|
| Built on [VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M) |
| (29528 vocab Γ 256 dim, 4-bit LF4 packed = **4.7 MB** on disk) with a |
| set of training-free retrieval upgrades that lift STS-B Spearman from |
| **0.7462** (baseline LF4) to **0.7560** (v3 with SIF+PC=1). |
|
|
| ## What changed vs the v1 baseline |
|
|
| All four upgrades are inference-time only β the underlying 4-bit weights |
| are bit-identical to the v1 artifact. They are: |
|
|
| 1. **SIF IDF weighting** with `sif_a=0.01` (sweep-optimized for STS-B). |
| 2. **Top-1 PC removal** (sweep-optimized β 1 PC is enough for STS-B). |
| 3. **Pure-numpy bucket-boundary segment-sum** for fast mean-pool. |
| 4. **CPU-torch scatter (index_add_)** for the hot path. |
|
|
| ## Benchmark |
|
|
| | Model | Spearman Ο STS-B | Encode ms/text | Dequant cold | RAM | On-disk | |
| |---|---|---|---|---|---| |
| | LF4 baseline (v1) | 0.7462 | 0.87 | 231 ms | 30 MB | 4.7 MB | |
| | **Vortex-Embed v3 (this)** | **0.7560** | **0.08** | 51 ms | 30 MB | 4.7 MB | |
|
|
| **+1.0 pp Spearman, 11Γ faster encode.** |
|
|
| ## Usage |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| from lf4_v3_sentence import VortexEmbedV3 |
| |
| path = snapshot_download("VTXAI/Vortex-Embed-v3-sentence") |
| model = VortexEmbedV3.from_pretrained(path) |
| print(f"vocab={model.vocab_size}, dim={model.dim}, size={model.model_size_mb:.1f} MB") |
| |
| # Single-text encode |
| vec = model.encode("find python json parser", normalize=True) # (256,) |
| |
| # Batch encode |
| docs = ["def parse_json(s): return json.loads(s)", |
| "class WeatherAPI: pass", |
| "import requests"] |
| doc_embs = model.encode(docs, normalize=True) # (3, 256) |
| |
| # RAG retrieval |
| import numpy as np |
| # ... chunk corpus, build doc_embs as (n, 256) ... |
| query = "where do we parse JSON requests" |
| q_emb = model.encode(query, normalize=True) |
| scores, indices = model.search(q_emb, doc_embs, top_k=10) |
| for rank, (s, i) in enumerate(zip(scores[0], indices[0]), 1): |
| print(f"#{rank} ({s:.3f}) doc #{i}") |
| ``` |
|
|
| ## Files |
|
|
| - `model.safetensors` β 4-bit LF4 packed weights (3.7 MB) |
| - `tokenizer.json` β HuggingFace fast tokenizer |
| - `config.json` β model + retrieval config |
| - `lf4_v3_sentence.py` β self-contained model class |
| - `README.md` β this file |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|