| --- |
| language: en |
| tags: |
| - sentence-transformers |
| - embeddings |
| - semantic-search |
| - retrieval |
| license: mit |
| --- |
| |
| ```python |
| import onnxruntime as ort |
| import numpy as np |
| from transformers import AutoTokenizer |
| from huggingface_hub import hf_hub_download |
| |
| # ββ Load βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/text-embedding", subfolder="tokenizer") |
| onnx_path = hf_hub_download("alanjoshua2005/text-embedding", "onnx/biencoder_rope.onnx") |
| session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"]) |
| |
| # ββ Encode βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| def encode(texts): |
| if isinstance(texts, str): texts = [texts] |
| enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np") |
| return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0] |
| |
| # ββ Test βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| emb = encode("Hello world!") |
| print(emb) # (1, 256) |
| ``` |
|
|
|
|
| # BiEncoder RoPE β Sentence Embedding Model |
|
|
| A 34M parameter sentence embedding model trained from scratch using PyTorch. |
|
|
| ## Architecture |
| - 6-layer Transformer encoder with RoPE positional embeddings |
| - Mean pooling + L2 normalization |
| - 256-dim output vectors |
|
|
| ## Training (Curriculum) |
| | Phase | Dataset | Loss | |
| |---|---|---| |
| | 1 | all-nli | MNRLoss | |
| | 2 | squad | MNRLoss | |
| | 3 | msmarco-bm25 | HardNegativeLoss | |
| | 4 | natural-questions | MNRLoss | |
|
|
| ## Files |
| - `tokenizer/` β HuggingFace tokenizer (bert-base-uncased) |
| - `pytorch/checkpoint_phase4_nq.pt` β PyTorch weights |
| - `onnx/biencoder_rope.onnx` β ONNX FP32 |
| - `onnx/biencoder_rope_int8.onnx` β ONNX INT8 (recommended for CPU) |
|
|
| ## Performance |
| - FP32 ONNX size : 134.3 MB |
| - INT8 ONNX size : 34.6 MB |
| - Throughput : ~247 sentences/sec on CPU |
|
|