text-embedding / README.md
alanjoshua2005's picture
Update README.md
893a410 verified
---
language: en
tags:
- sentence-transformers
- embeddings
- semantic-search
- retrieval
license: mit
---
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# ── Load ───────────────────────────────────────────────────────────────────
tokenizer = AutoTokenizer.from_pretrained("alanjoshua2005/text-embedding", subfolder="tokenizer")
onnx_path = hf_hub_download("alanjoshua2005/text-embedding", "onnx/biencoder_rope.onnx")
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
# ── Encode ─────────────────────────────────────────────────────────────────
def encode(texts):
if isinstance(texts, str): texts = [texts]
enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np")
return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]
# ── Test ───────────────────────────────────────────────────────────────────
emb = encode("Hello world!")
print(emb) # (1, 256)
```
# BiEncoder RoPE β€” Sentence Embedding Model
A 34M parameter sentence embedding model trained from scratch using PyTorch.
## Architecture
- 6-layer Transformer encoder with RoPE positional embeddings
- Mean pooling + L2 normalization
- 256-dim output vectors
## Training (Curriculum)
| Phase | Dataset | Loss |
|---|---|---|
| 1 | all-nli | MNRLoss |
| 2 | squad | MNRLoss |
| 3 | msmarco-bm25 | HardNegativeLoss |
| 4 | natural-questions | MNRLoss |
## Files
- `tokenizer/` β€” HuggingFace tokenizer (bert-base-uncased)
- `pytorch/checkpoint_phase4_nq.pt` β€” PyTorch weights
- `onnx/biencoder_rope.onnx` β€” ONNX FP32
- `onnx/biencoder_rope_int8.onnx` β€” ONNX INT8 (recommended for CPU)
## Performance
- FP32 ONNX size : 134.3 MB
- INT8 ONNX size : 34.6 MB
- Throughput : ~247 sentences/sec on CPU