Abhaykoul commited on
Commit
44bcd16
·
verified ·
1 Parent(s): d674f5e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +123 -104
README.md CHANGED
@@ -1,104 +1,123 @@
1
- ---
2
- library_name: lf4
3
- tags:
4
- - lf4
5
- - static-embedding
6
- - 4-bit
7
- - quantized
8
- - sentence-similarity
9
- - code-search
10
- - tool-search
11
- - sentence-transformers
12
- - embedding
13
- language: en
14
- license: mit
15
- pipeline_tag: sentence-similarity
16
- ---
17
-
18
- # VTXAI/Vortex-Embed-4.7M
19
-
20
- **Native 4-bit quantized** static sentence embedding model.
21
- Generates 256-dimensional sentence embeddings via mean-pooling of a learned 4-bit quantized embedding table.
22
-
23
- Weighs only **4.7 MB** on disk — no transformers, no torch, no GPU needed.
24
-
25
- ## Model Size
26
-
27
- | Format | Size | Compression |
28
- |--------|------|-------------|
29
- | FP32 (original) | 28.8 MB | 1.0× |
30
- | **LF4 (this model)** | **4.7 MB** | **6.4×** |
31
-
32
- ## Architecture
33
-
34
- Learned static embedding table with 4-bit per-block quantization (LF4):
35
-
36
- ```
37
- LF4StaticEmbedding(
38
- vocab=29528, dim=256, bits=4,
39
- block_size=32, size=4.7MB
40
- )
41
- ```
42
-
43
- Encoding: `tokenize → lookup dequantized embeddings → mean pool → L2 normalize`
44
-
45
- Weights stored as:
46
- - `embedding_packed`: uint8 (29528 × 128) — 4-bit packed, 2 values/byte
47
- - `embedding_scales`: float16 (29528 × 8) per-block scale
48
- - `embedding_zeros`: float16 (29528 × 8) per-block zero-point
49
-
50
- ## Usage
51
-
52
- ### Python inference (lightweight, no torch)
53
-
54
- ```python
55
- from lf4_model import LF4StaticEmbedding
56
-
57
- model = LF4StaticEmbedding.from_pretrained("VTXAI/Vortex-Embed-4.7M")
58
- print(model) # LF4StaticEmbedding(vocab=29528, dim=256, bits=4, size=4.7MB)
59
-
60
- # Encode sentences to 256-dim vectors
61
- embeddings = model.encode(["search the web for news", "read file contents"])
62
-
63
- # Cosine similarity search
64
- scores, indices = model.search(query_emb, doc_emb, top_k=10)
65
- ```
66
-
67
- ### With sentence-transformers (torch)
68
-
69
- ```python
70
- from sentence_transformers import SentenceTransformer
71
-
72
- model = SentenceTransformer("VTXAI/Vortex-Embed-4.7M", backend="static")
73
- embeddings = model.encode(["search the web for news", "read file contents"])
74
- ```
75
-
76
- ## Quality
77
-
78
- - **Cosine preservation vs FP32**: 0.9969
79
- - **MSE**: 0.256990
80
- - **Tool search accuracy**: 100% (15/15, benchmarks)
81
- - **Codebase indexing**: 12.5s index, 14.6ms P50 search (JARVIS codebase, 2707 chunks)
82
- - Trained on: CornStack (Python/JS/Java) + Glaive function-calling
83
- - Base: **VTXAI/Vortex-Embed** fine-tuned LF4 quantized
84
-
85
- ## Why Static Embedding?
86
-
87
- | Feature | Static (this) | Transformer (BERT) |
88
- |---|---|---|
89
- | Inference speed | **0.15ms** | ~50ms |
90
- | Load time | **144ms** | ~5s |
91
- | Disk size | **4.7 MB** | ~400 MB |
92
- | GPU needed | **No** | Recommended |
93
- | Accuracy | Comparable* | Higher for complex semantics |
94
-
95
- \* For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
96
-
97
- ## No Dependencies Beyond NumPy
98
-
99
- ```bash
100
- pip install numpy safetensors tokenizers
101
- ```
102
-
103
- The model loads and runs with just `numpy`, `safetensors`, and HuggingFace `tokenizers`.
104
- No PyTorch, no transformers, no sentence-transformers required for basic inference.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: lf4
4
+ license: mit
5
+ pipeline_tag: sentence-similarity
6
+ tags:
7
+ - lf4
8
+ - lf4-static-embedding
9
+ - static-embedding
10
+ - 4-bit
11
+ - quantized
12
+ - code-search
13
+ - tool-search
14
+ - embedding
15
+ - codebase
16
+ - semantic-search
17
+
18
+ ---
19
+
20
+ # Vortex-Embed-4.7M
21
+
22
+ **4-bit quantized static sentence embedding model** — 256-dim embeddings, 4.7 MB on disk, no PyTorch/transformers needed.
23
+
24
+ Used as the default embedder in [**vortexa**](https://github.com/OEvortex/vortexa) — a standalone codebase indexing and semantic search engine.
25
+
26
+ ## Model Size
27
+
28
+ | Format | Size | Compression |
29
+ |--------|------|-------------|
30
+ | FP32 (original) | 28.8 MB | 1.0x |
31
+ | **LF4 (this model)** | **4.7 MB** | **6.4x** |
32
+
33
+ ## Architecture
34
+
35
+ Learned static embedding table with 4-bit per-block quantization (LF4):
36
+
37
+ `
38
+ vocab=29528 dim=256 bits=4 block_size=32 size=4.7MB
39
+ `
40
+
41
+ Encoding: tokenize, lookup dequantized embeddings, mean pool, L2 normalize
42
+
43
+ ### Weight Format
44
+
45
+ | Tensor | Dtype | Shape | Description |
46
+ |--------|-------|-------|-------------|
47
+ | embedding_packed | uint8 | (29528, 128) | 4-bit packed, 2 values/byte |
48
+ | embedding_scales | float16 | (29528, 8) | Per-block scale |
49
+ | embedding_zeros | float16 | (29528, 8) | Per-block zero-point |
50
+
51
+ ## Usage
52
+
53
+ ### With vortexa (recommended)
54
+
55
+ `ash
56
+ pip install vortexa
57
+ `
58
+
59
+ `python
60
+ from vortexa.core.indexer import CodebaseIndexer
61
+
62
+ # vortexa uses this model by default
63
+ indexer = CodebaseIndexer(root='.')
64
+ stats = indexer.index()
65
+ results = indexer.search('find CSV parser', top_k=5)
66
+ `
67
+
68
+ ### Standalone inference (lightweight, no torch)
69
+
70
+ `python
71
+ from lf4_model import LF4StaticEmbedding
72
+
73
+ model = LF4StaticEmbedding.from_pretrained('VTXAI/Vortex-Embed-4.7M')
74
+ embeddings = model.encode(['search the web', 'read file'])
75
+ scores, indices = model.search(query_emb, doc_emb, top_k=10)
76
+ `
77
+
78
+ ### With sentence-transformers
79
+
80
+ `python
81
+ from sentence_transformers import SentenceTransformer
82
+ model = SentenceTransformer('VTXAI/Vortex-Embed-4.7M', backend='static')
83
+ embeddings = model.encode(['search the web', 'read file'])
84
+ `
85
+
86
+ ## Performance
87
+
88
+ | Metric | Value |
89
+ |--------|-------|
90
+ | Cosine preservation vs FP32 | 0.9969 |
91
+ | MSE | 0.257 |
92
+ | Tool search accuracy | 100% (15/15) |
93
+ | Inference speed | ~0.15ms per text |
94
+ | Load time | ~144ms |
95
+ | Search (P50, 2707 chunks) | 14.6ms |
96
+
97
+ ## Why Static Embedding?
98
+
99
+ | Feature | Static (this) | Transformer (BERT) |
100
+ |---------|--------------|-------------------|
101
+ | Inference | **0.15ms** | ~50ms |
102
+ | Load time | **144ms** | ~5s |
103
+ | Disk | **4.7 MB** | ~400 MB |
104
+ | GPU | **No** | Recommended |
105
+ | Accuracy | Comparable | Higher (complex semantics) |
106
+
107
+ For domain-specific tasks (code search, tool retrieval) the gap narrows significantly.
108
+
109
+ ## Dependencies
110
+
111
+ pip install numpy safetensors tokenizers
112
+
113
+ No PyTorch, no transformers, no GPU required for basic inference.
114
+
115
+ ## Citation
116
+
117
+ bibtex:
118
+ @software{vortex-embed-4.7m,
119
+ title = {Vortex-Embed-4.7M},
120
+ author = {VortexAI},
121
+ year = {2025},
122
+ url = {https://huggingface.co/VTXAI/Vortex-Embed-4.7M}
123
+ }