import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# ── Load ───────────────────────────────────────────────────────────────────
tokenizer    = AutoTokenizer.from_pretrained("alanjoshua2005/text-embedding", subfolder="tokenizer")
onnx_path    = hf_hub_download("alanjoshua2005/text-embedding", "onnx/biencoder_rope.onnx")
session      = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

# ── Encode ─────────────────────────────────────────────────────────────────
def encode(texts):
    if isinstance(texts, str): texts = [texts]
    enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np")
    return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]

# ── Test ───────────────────────────────────────────────────────────────────
emb = encode("Hello world!")
print(emb)   # (1, 256)

BiEncoder RoPE β€” Sentence Embedding Model

A 34M parameter sentence embedding model trained from scratch using PyTorch.

Architecture

  • 6-layer Transformer encoder with RoPE positional embeddings
  • Mean pooling + L2 normalization
  • 256-dim output vectors

Training (Curriculum)

Phase Dataset Loss
1 all-nli MNRLoss
2 squad MNRLoss
3 msmarco-bm25 HardNegativeLoss
4 natural-questions MNRLoss

Files

  • tokenizer/ β€” HuggingFace tokenizer (bert-base-uncased)
  • pytorch/checkpoint_phase4_nq.pt β€” PyTorch weights
  • onnx/biencoder_rope.onnx β€” ONNX FP32
  • onnx/biencoder_rope_int8.onnx β€” ONNX INT8 (recommended for CPU)

Performance

  • FP32 ONNX size : 134.3 MB
  • INT8 ONNX size : 34.6 MB
  • Throughput : ~247 sentences/sec on CPU
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support