Abhaykoul's picture
Initial release: Vortex-Embed v3 (Spearman 0.7560, 11Γ— faster)
a5837f9 verified
|
Raw
History Blame Contribute Delete
2.54 kB
---
license: apache-2.0
tags:
- sentence-similarity
- feature-extraction
- static-embeddings
- lf4-quantization
- retrieval
- rag
model_name: Vortex-Embed v3
metrics:
- spearman
---
# Vortex-Embed v3 β€” Sentence-Similarity for RAG
**Retrieval-optimized 4-bit static embeddings for sentence-similarity and RAG.**
Built on [VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)
(29528 vocab Γ— 256 dim, 4-bit LF4 packed = **4.7 MB** on disk) with a
set of training-free retrieval upgrades that lift STS-B Spearman from
**0.7462** (baseline LF4) to **0.7560** (v3 with SIF+PC=1).
## What changed vs the v1 baseline
All four upgrades are inference-time only β€” the underlying 4-bit weights
are bit-identical to the v1 artifact. They are:
1. **SIF IDF weighting** with `sif_a=0.01` (sweep-optimized for STS-B).
2. **Top-1 PC removal** (sweep-optimized β€” 1 PC is enough for STS-B).
3. **Pure-numpy bucket-boundary segment-sum** for fast mean-pool.
4. **CPU-torch scatter (index_add_)** for the hot path.
## Benchmark
| Model | Spearman ρ STS-B | Encode ms/text | Dequant cold | RAM | On-disk |
|---|---|---|---|---|---|
| LF4 baseline (v1) | 0.7462 | 0.87 | 231 ms | 30 MB | 4.7 MB |
| **Vortex-Embed v3 (this)** | **0.7560** | **0.08** | 51 ms | 30 MB | 4.7 MB |
**+1.0 pp Spearman, 11Γ— faster encode.**
## Usage
```python
from huggingface_hub import snapshot_download
from lf4_v3_sentence import VortexEmbedV3
path = snapshot_download("VTXAI/Vortex-Embed-v3-sentence")
model = VortexEmbedV3.from_pretrained(path)
print(f"vocab={model.vocab_size}, dim={model.dim}, size={model.model_size_mb:.1f} MB")
# Single-text encode
vec = model.encode("find python json parser", normalize=True) # (256,)
# Batch encode
docs = ["def parse_json(s): return json.loads(s)",
"class WeatherAPI: pass",
"import requests"]
doc_embs = model.encode(docs, normalize=True) # (3, 256)
# RAG retrieval
import numpy as np
# ... chunk corpus, build doc_embs as (n, 256) ...
query = "where do we parse JSON requests"
q_emb = model.encode(query, normalize=True)
scores, indices = model.search(q_emb, doc_embs, top_k=10)
for rank, (s, i) in enumerate(zip(scores[0], indices[0]), 1):
print(f"#{rank} ({s:.3f}) doc #{i}")
```
## Files
- `model.safetensors` β€” 4-bit LF4 packed weights (3.7 MB)
- `tokenizer.json` β€” HuggingFace fast tokenizer
- `config.json` β€” model + retrieval config
- `lf4_v3_sentence.py` β€” self-contained model class
- `README.md` β€” this file
## License
Apache 2.0