--- license: apache-2.0 tags: - sentence-similarity - feature-extraction - static-embeddings - lf4-quantization - retrieval - rag model_name: Vortex-Embed v3 metrics: - spearman --- # Vortex-Embed v3 — Sentence-Similarity for RAG **Retrieval-optimized 4-bit static embeddings for sentence-similarity and RAG.** Built on [VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M) (29528 vocab × 256 dim, 4-bit LF4 packed = **4.7 MB** on disk) with a set of training-free retrieval upgrades that lift STS-B Spearman from **0.7462** (baseline LF4) to **0.7560** (v3 with SIF+PC=1). ## What changed vs the v1 baseline All four upgrades are inference-time only — the underlying 4-bit weights are bit-identical to the v1 artifact. They are: 1. **SIF IDF weighting** with `sif_a=0.01` (sweep-optimized for STS-B). 2. **Top-1 PC removal** (sweep-optimized — 1 PC is enough for STS-B). 3. **Pure-numpy bucket-boundary segment-sum** for fast mean-pool. 4. **CPU-torch scatter (index_add_)** for the hot path. ## Benchmark | Model | Spearman ρ STS-B | Encode ms/text | Dequant cold | RAM | On-disk | |---|---|---|---|---|---| | LF4 baseline (v1) | 0.7462 | 0.87 | 231 ms | 30 MB | 4.7 MB | | **Vortex-Embed v3 (this)** | **0.7560** | **0.08** | 51 ms | 30 MB | 4.7 MB | **+1.0 pp Spearman, 11× faster encode.** ## Usage ```python from huggingface_hub import snapshot_download from lf4_v3_sentence import VortexEmbedV3 path = snapshot_download("VTXAI/Vortex-Embed-v3-sentence") model = VortexEmbedV3.from_pretrained(path) print(f"vocab={model.vocab_size}, dim={model.dim}, size={model.model_size_mb:.1f} MB") # Single-text encode vec = model.encode("find python json parser", normalize=True) # (256,) # Batch encode docs = ["def parse_json(s): return json.loads(s)", "class WeatherAPI: pass", "import requests"] doc_embs = model.encode(docs, normalize=True) # (3, 256) # RAG retrieval import numpy as np # ... chunk corpus, build doc_embs as (n, 256) ... query = "where do we parse JSON requests" q_emb = model.encode(query, normalize=True) scores, indices = model.search(q_emb, doc_embs, top_k=10) for rank, (s, i) in enumerate(zip(scores[0], indices[0]), 1): print(f"#{rank} ({s:.3f}) doc #{i}") ``` ## Files - `model.safetensors` — 4-bit LF4 packed weights (3.7 MB) - `tokenizer.json` — HuggingFace fast tokenizer - `config.json` — model + retrieval config - `lf4_v3_sentence.py` — self-contained model class - `README.md` — this file ## License Apache 2.0