FGFGFGGDFGDFGSD commited on
Commit
7de65e6
·
verified ·
1 Parent(s): c1b85de

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - code
5
+ tags:
6
+ - python
7
+ - javascript
8
+ - go
9
+ - java
10
+ - php
11
+ - ruby
12
+ - c++
13
+ - embeddings
14
+ - code-search
15
+ - onnx
16
+ - albert
17
+ - matryoshka
18
+ - extreme-compression
19
+ license: mit
20
+ ---
21
+
22
+ # ALRI: Ultra-Efficient Code Embeddings 🚀
23
+
24
+ ALRI (A Lightweight Retrieval Intelligence) is a family of next-generation embedding models specifically designed for extreme efficiency and high-speed code retrieval.
25
+
26
+ By combining modern architectural innovations with aggressive parameter optimization, ALRI achieves near-state-of-the-art retrieval performance at a fraction of the size of standard models like MiniLM.
27
+
28
+ ## 🧬 Key Technologies
29
+
30
+ - **ALBERT-style Weight Sharing**: Utilizes recursive transformer blocks to maintain deep representations while drastically reducing the unique parameter count.
31
+ - **Extreme Hashed Embeddings**: Vocabulary compression that maps 151k virtual tokens into 32k real vectors, eliminating redundancy and reducing memory footprint.
32
+ - **Funnel Attention**: Dynamic sequence pooling that accelerates inference by reducing token density in deeper layers.
33
+ - **Matryoshka Representation Learning (MRL)**: Flexible output dimensions (32, 64, 128, 384) allowing you to trade off accuracy for even greater speed.
34
+ - **Distilled Intelligence**: Knowledge distilled from a 24M parameter teacher into a sub-million parameter "Nano" engine.
35
+
36
+ ## 📊 Models
37
+
38
+ | Model | Parameters | Size (ONNX) | Acc@1 (Python) | Speed (CPU) |
39
+ |---|---|---|---|---|
40
+ | **ALRI-Tiny** | 24M | ~90 MB | **96.8%** | ~35 ms |
41
+ | **ALRI-Nano** | **0.93M** | **~6 MB** | **94.0%** | **~2 ms** |
42
+ | *MiniLM-L6* | *22M* | *~80 MB* | *92.0%* | *~40 ms* |
43
+
44
+ *Note: ALRI-Nano is ~25x smaller than MiniLM-L6 while maintaining superior accuracy on code retrieval tasks.*
45
+
46
+ ## 🚀 Getting Started (ONNX)
47
+
48
+ The models are optimized for [ONNX Runtime](https://onnxruntime.ai/). You can run them on any CPU with minimal latency.
49
+
50
+ ```python
51
+ import onnxruntime as ort
52
+ from transformers import AutoTokenizer
53
+ import numpy as np
54
+
55
+ # Load Nano model
56
+ session = ort.InferenceSession("alri-nano-onnx/model_int8.onnx")
57
+ tokenizer = AutoTokenizer.from_pretrained("alri-nano-onnx/tokenizer")
58
+
59
+ text = "how to read a json file in python"
60
+ inputs = tokenizer(text, return_tensors="np")
61
+
62
+ outputs = session.run(None, {
63
+ "input_ids": inputs["input_ids"].astype(np.int64),
64
+ "attention_mask": inputs["attention_mask"].astype(np.int64)
65
+ })
66
+ embedding = outputs[0] # (1, 128)
67
+ ```
68
+
69
+ ## 🎯 Use Cases
70
+
71
+ - **Real-time IDE Autocomplete**: Lightning-fast context retrieval.
72
+ - **Mobile & Edge Search**: High-quality search on low-power devices.
73
+ - **Massive Code Indexing**: Extremely low storage costs per embedding.
74
+
75
+ ## 📜 License
76
+ MIT