import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# ββ Load βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/text-embedding", subfolder="tokenizer")
onnx_path = hf_hub_download("alanjoshua2005/text-embedding", "onnx/biencoder_rope.onnx")
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
# ββ Encode βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def encode(texts):
if isinstance(texts, str): texts = [texts]
enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np")
return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]
# ββ Test βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
emb = encode("Hello world!")
print(emb) # (1, 256)
BiEncoder RoPE β Sentence Embedding Model
A 34M parameter sentence embedding model trained from scratch using PyTorch.
Architecture
- 6-layer Transformer encoder with RoPE positional embeddings
- Mean pooling + L2 normalization
- 256-dim output vectors
Training (Curriculum)
| Phase | Dataset | Loss |
|---|---|---|
| 1 | all-nli | MNRLoss |
| 2 | squad | MNRLoss |
| 3 | msmarco-bm25 | HardNegativeLoss |
| 4 | natural-questions | MNRLoss |
Files
tokenizer/β HuggingFace tokenizer (bert-base-uncased)pytorch/checkpoint_phase4_nq.ptβ PyTorch weightsonnx/biencoder_rope.onnxβ ONNX FP32onnx/biencoder_rope_int8.onnxβ ONNX INT8 (recommended for CPU)
Performance
- FP32 ONNX size : 134.3 MB
- INT8 ONNX size : 34.6 MB
- Throughput : ~247 sentences/sec on CPU
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support