Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +76 -0

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+language:
+- en
+- code
+tags:
+- python
+- javascript
+- go
+- java
+- php
+- ruby
+- c++
+- embeddings
+- code-search
+- onnx
+- albert
+- matryoshka
+- extreme-compression
+license: mit
+---
+# ALRI: Ultra-Efficient Code Embeddings 🚀
+ALRI (A Lightweight Retrieval Intelligence) is a family of next-generation embedding models specifically designed for extreme efficiency and high-speed code retrieval.
+By combining modern architectural innovations with aggressive parameter optimization, ALRI achieves near-state-of-the-art retrieval performance at a fraction of the size of standard models like MiniLM.
+## 🧬 Key Technologies
+- **ALBERT-style Weight Sharing**: Utilizes recursive transformer blocks to maintain deep representations while drastically reducing the unique parameter count.
+- **Extreme Hashed Embeddings**: Vocabulary compression that maps 151k virtual tokens into 32k real vectors, eliminating redundancy and reducing memory footprint.
+- **Funnel Attention**: Dynamic sequence pooling that accelerates inference by reducing token density in deeper layers.
+- **Matryoshka Representation Learning (MRL)**: Flexible output dimensions (32, 64, 128, 384) allowing you to trade off accuracy for even greater speed.
+- **Distilled Intelligence**: Knowledge distilled from a 24M parameter teacher into a sub-million parameter "Nano" engine.
+## 📊 Models
+| Model | Parameters | Size (ONNX) | Acc@1 (Python) | Speed (CPU) |
+|---|---|---|---|---|
+| **ALRI-Tiny** | 24M | ~90 MB | **96.8%** | ~35 ms |
+| **ALRI-Nano** | **0.93M** | **~6 MB** | **94.0%** | **~2 ms** |
+| *MiniLM-L6* | *22M* | *~80 MB* | *92.0%* | *~40 ms* |
+*Note: ALRI-Nano is ~25x smaller than MiniLM-L6 while maintaining superior accuracy on code retrieval tasks.*
+## 🚀 Getting Started (ONNX)
+The models are optimized for [ONNX Runtime](https://onnxruntime.ai/). You can run them on any CPU with minimal latency.
+```python
+import onnxruntime as ort
+from transformers import AutoTokenizer
+import numpy as np
+# Load Nano model
+session = ort.InferenceSession("alri-nano-onnx/model_int8.onnx")
+tokenizer = AutoTokenizer.from_pretrained("alri-nano-onnx/tokenizer")
+text = "how to read a json file in python"
+inputs = tokenizer(text, return_tensors="np")
+outputs = session.run(None, {
+    "input_ids": inputs["input_ids"].astype(np.int64),
+    "attention_mask": inputs["attention_mask"].astype(np.int64)
+})
+embedding = outputs[0] # (1, 128)
+```
+## 🎯 Use Cases
+- **Real-time IDE Autocomplete**: Lightning-fast context retrieval.
+- **Mobile & Edge Search**: High-quality search on low-power devices.
+- **Massive Code Indexing**: Extremely low storage costs per embedding.
+## 📜 License
+MIT