Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +123 -104

README.md CHANGED Viewed

@@ -1,104 +1,123 @@
----
-library_name: lf4
-tags:
-- lf4
-- static-embedding
-- 4-bit
-- quantized
-- sentence-similarity
-- code-search
-- tool-search
-- sentence-transformers
-- embedding
-language: en
-license: mit
-pipeline_tag: sentence-similarity
----
-# VTXAI/Vortex-Embed-4.7M
-**Native 4-bit quantized** static sentence embedding model.
-Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.
-Weighs only **4.7 MB** on disk — no transformers, no torch, no GPU needed.
-## Model Size
-| Format | Size | Compression |
-|--------|------|-------------|
-| FP32 (original) | 28.8 MB | 1.0× |
-| **LF4 (this model)** | **4.7 MB** | **6.4×** |
-## Architecture
-Learned static embedding table with 4-bit per-block quantization (LF4):
-```
-LF4StaticEmbedding(
-  vocab=29528, dim=256, bits=4,
-  block_size=32, size=4.7MB
-)
-```
-Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize`
-Weights stored as:
-- `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte
-- `embedding_scales`: float16 (29528 × 8) — per-block scale
-- `embedding_zeros`: float16 (29528 × 8) — per-block zero-point
-## Usage
-### Python inference (lightweight, no torch)
-```python
-from lf4_model import LF4StaticEmbedding
-model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
-print(model)  # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)
-# Encode sentences to 256-dim vectors
-embeddings = model.encode(["search the web for news", "read file contents"])
-# Cosine similarity search
-scores, indices = model.search(query_emb, doc_emb, top_k=10)
-```
-### With sentence-transformers (torch)
-```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
-embeddings = model.encode(["search the web for news", "read file contents"])
-```
-## Quality
-- **Cosine preservation vs FP32**: 0.9969
-- **MSE**: 0.256990
-- **Tool search accuracy**: 100% (15/15, benchmarks)
-- **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
-- Trained on: CornStack (Python/JS/Java) + Glaive function-calling
-- Base: **VTXAI/Vortex-Embed** → fine-tuned → LF4 quantized
-## Why Static Embedding?
-| Feature | Static (this) | Transformer (BERT) |
-|---|---|---|
-| Inference speed | **0.15ms** | ~50ms |
-| Load time | **144ms** | ~5s |
-| Disk size | **4.7 MB** | ~400 MB |
-| GPU needed | **No** | Recommended |
-| Accuracy | Comparable* | Higher for complex semantics |
-\* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
-## No Dependencies Beyond NumPy
-```bash
-pip install numpy safetensors tokenizers
-```
-The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.
-No PyTorch, no transformers, no sentence-transformers required for basic inference.

+---
+language: en
+library_name: lf4
+license: mit
+pipeline_tag: sentence-similarity
+tags:
+- lf4
+- lf4-static-embedding
+- static-embedding
+- 4-bit
+- quantized
+- code-search
+- tool-search
+- embedding
+- codebase
+- semantic-search
+---
+# Vortex-Embed-4.7M
+**4-bit quantized static sentence embedding model** — 256-dim embeddings, 4.7 MB on disk, no PyTorch/transformers needed.
+Used as the default embedder in [**vortexa**](https://github.com/OEvortex/vortexa) — a standalone codebase indexing and semantic search engine.
+## Model Size
+| Format | Size | Compression |
+|--------|------|-------------|
+| FP32 (original) | 28.8 MB | 1.0x |
+| **LF4 (this model)** | **4.7 MB** | **6.4x** |
+## Architecture
+Learned static embedding table with 4-bit per-block quantization (LF4):
+`
+vocab=29528 dim=256 bits=4 block_size=32 size=4.7MB
+`
+Encoding: tokenize, lookup dequantized embeddings, mean pool, L2 normalize
+### Weight Format
+| Tensor | Dtype | Shape | Description |
+|--------|-------|-------|-------------|
+| embedding_packed | uint8 | (29528, 128) | 4-bit packed, 2 values/byte |
+| embedding_scales | float16 | (29528, 8) | Per-block scale |
+| embedding_zeros | float16 | (29528, 8) | Per-block zero-point |
+## Usage
+### With vortexa (recommended)
+`ash
+pip install vortexa
+`
+`python
+from vortexa.core.indexer import CodebaseIndexer
+# vortexa uses this model by default
+indexer = CodebaseIndexer(root='.')
+stats = indexer.index()
+results = indexer.search('find CSV parser', top_k=5)
+`
+### Standalone inference (lightweight, no torch)
+`python
+from lf4_model import LF4StaticEmbedding
+model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')
+embeddings = model.encode(['search the web', 'read file'])
+scores, indices = model.search(query_emb, doc_emb, top_k=10)
+`
+### With sentence-transformers
+`python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
+embeddings = model.encode(['search the web', 'read file'])
+`
+## Performance
+| Metric | Value |
+|--------|-------|
+| Cosine preservation vs FP32 | 0.9969 |
+| MSE | 0.257 |
+| Tool search accuracy | 100% (15/15) |
+| Inference speed | ~0.15ms per text |
+| Load time | ~144ms |
+| Search (P50, 2707 chunks) | 14.6ms |
+## Why Static Embedding?
+| Feature | Static (this) | Transformer (BERT) |
+|---------|--------------|-------------------|
+| Inference | **0.15ms** | ~50ms |
+| Load time | **144ms** | ~5s |
+| Disk | **4.7 MB** | ~400 MB |
+| GPU | **No** | Recommended |
+| Accuracy | Comparable | Higher (complex semantics) |
+For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
+## Dependencies
+pip install numpy safetensors tokenizers
+No PyTorch, no transformers, no GPU required for basic inference.
+## Citation
+bibtex:
+@software{vortex-embed-4.7m,
+  title = {Vortex-Embed-4.7M},
+  author = {VortexAI},
+  year = {2025},
+  url = {https://huggingface.co/VTXAI/Vortex-Embed-4.7M}
+}