--- language: en tags: - sentence-transformers - embeddings - semantic-search - retrieval license: mit --- ```python import onnxruntime as ort import numpy as np from transformers import AutoTokenizer from huggingface_hub import hf_hub_download # ── Load ─────────────────────────────────────────────────────────────────── tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/text-embedding", subfolder="tokenizer") onnx_path = hf_hub_download("alanjoshua2005/text-embedding", "onnx/biencoder_rope.onnx") session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"]) # ── Encode ───────────────────────────────────────────────────────────────── def encode(texts): if isinstance(texts, str): texts = [texts] enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np") return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0] # ── Test ─────────────────────────────────────────────────────────────────── emb = encode("Hello world!") print(emb) # (1, 256) ``` # BiEncoder RoPE — Sentence Embedding Model A 34M parameter sentence embedding model trained from scratch using PyTorch. ## Architecture - 6-layer Transformer encoder with RoPE positional embeddings - Mean pooling + L2 normalization - 256-dim output vectors ## Training (Curriculum) | Phase | Dataset | Loss | |---|---|---| | 1 | all-nli | MNRLoss | | 2 | squad | MNRLoss | | 3 | msmarco-bm25 | HardNegativeLoss | | 4 | natural-questions | MNRLoss | ## Files - `tokenizer/` — HuggingFace tokenizer (bert-base-uncased) - `pytorch/checkpoint_phase4_nq.pt` — PyTorch weights - `onnx/biencoder_rope.onnx` — ONNX FP32 - `onnx/biencoder_rope_int8.onnx` — ONNX INT8 (recommended for CPU) ## Performance - FP32 ONNX size : 134.3 MB - INT8 ONNX size : 34.6 MB - Throughput : ~247 sentences/sec on CPU