Initial release: Vortex-Embed v3 (Spearman 0.7560, 11× faster)

a5837f9 verified 12 days ago

2.54 kB

	---
	license: apache-2.0
	tags:
	- sentence-similarity
	- feature-extraction
	- static-embeddings
	- lf4-quantization
	- retrieval
	- rag
	model_name: Vortex-Embed v3
	metrics:
	- spearman
	---

	# Vortex-Embed v3 — Sentence-Similarity for RAG

	Retrieval-optimized 4-bit static embeddings for sentence-similarity and RAG.

	Built on [VTXAI/Vortex-Embed-4.7M](https://huggingface.co/VTXAI/Vortex-Embed-4.7M)
	(29528 vocab × 256 dim, 4-bit LF4 packed = 4.7 MB on disk) with a
	set of training-free retrieval upgrades that lift STS-B Spearman from
	0.7462 (baseline LF4) to 0.7560 (v3 with SIF+PC=1).

	## What changed vs the v1 baseline

	All four upgrades are inference-time only — the underlying 4-bit weights
	are bit-identical to the v1 artifact. They are:

	1. SIF IDF weighting with `sif_a=0.01` (sweep-optimized for STS-B).
	2. Top-1 PC removal (sweep-optimized — 1 PC is enough for STS-B).
	3. Pure-numpy bucket-boundary segment-sum for fast mean-pool.
	4. CPU-torch scatter (index_add_) for the hot path.

	## Benchmark

	\| Model \| Spearman ρ STS-B \| Encode ms/text \| Dequant cold \| RAM \| On-disk \|
	\|---\|---\|---\|---\|---\|---\|
	\| LF4 baseline (v1) \| 0.7462 \| 0.87 \| 231 ms \| 30 MB \| 4.7 MB \|
	\| Vortex-Embed v3 (this) \| 0.7560 \| 0.08 \| 51 ms \| 30 MB \| 4.7 MB \|

	+1.0 pp Spearman, 11× faster encode.

	## Usage

	```python
	from huggingface_hub import snapshot_download
	from lf4_v3_sentence import VortexEmbedV3

	path = snapshot_download("VTXAI/Vortex-Embed-v3-sentence")
	model = VortexEmbedV3.from_pretrained(path)
	print(f"vocab={model.vocab_size}, dim={model.dim}, size={model.model_size_mb:.1f} MB")

	# Single-text encode
	vec = model.encode("find python json parser", normalize=True) # (256,)

	# Batch encode
	docs = ["def parse_json(s): return json.loads(s)",
	"class WeatherAPI: pass",
	"import requests"]
	doc_embs = model.encode(docs, normalize=True) # (3, 256)

	# RAG retrieval
	import numpy as np
	# ... chunk corpus, build doc_embs as (n, 256) ...
	query = "where do we parse JSON requests"
	q_emb = model.encode(query, normalize=True)
	scores, indices = model.search(q_emb, doc_embs, top_k=10)
	for rank, (s, i) in enumerate(zip(scores[0], indices[0]), 1):
	print(f"#{rank} ({s:.3f}) doc #{i}")
	```

	## Files

	- `model.safetensors` — 4-bit LF4 packed weights (3.7 MB)
	- `tokenizer.json` — HuggingFace fast tokenizer
	- `config.json` — model + retrieval config
	- `lf4_v3_sentence.py` — self-contained model class
	- `README.md` — this file

	## License

	Apache 2.0