File size: 2,324 Bytes
6001257
 
 
 
 
 
 
 
 
 
8c9803b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6001257
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
language: en
tags:
  - sentence-transformers
  - embeddings
  - semantic-search
  - retrieval
license: mit
---

```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# ── Load ───────────────────────────────────────────────────────────────────
tokenizer    = AutoTokenizer.from_pretrained("alanjoshua2005/text-embedding", subfolder="tokenizer")
onnx_path    = hf_hub_download("alanjoshua2005/text-embedding", "onnx/biencoder_rope.onnx")
session      = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

# ── Encode ─────────────────────────────────────────────────────────────────
def encode(texts):
    if isinstance(texts, str): texts = [texts]
    enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np")
    return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]

# ── Test ───────────────────────────────────────────────────────────────────
emb = encode("Hello world!")
print(emb)   # (1, 256)
```


# BiEncoder RoPE β€” Sentence Embedding Model

A 34M parameter sentence embedding model trained from scratch using PyTorch.

## Architecture
- 6-layer Transformer encoder with RoPE positional embeddings
- Mean pooling + L2 normalization
- 256-dim output vectors

## Training (Curriculum)
| Phase | Dataset | Loss |
|---|---|---|
| 1 | all-nli | MNRLoss |
| 2 | squad | MNRLoss |
| 3 | msmarco-bm25 | HardNegativeLoss |
| 4 | natural-questions | MNRLoss |

## Files
- `tokenizer/` β€” HuggingFace tokenizer (bert-base-uncased)
- `pytorch/checkpoint_phase4_nq.pt` β€” PyTorch weights
- `onnx/biencoder_rope.onnx` β€” ONNX FP32
- `onnx/biencoder_rope_int8.onnx` β€” ONNX INT8 (recommended for CPU)

## Performance
- FP32 ONNX size : 134.3 MB
- INT8 ONNX size : 34.6 MB
- Throughput     : ~247 sentences/sec on CPU