OmniEM-EN v1
Recursive kernel-native sentence embedder: GOAT-V (no Q/K) + per-head b/eps, Yat-MLP,
geodesic-momentum skip (exp-map + parallel transport on the unit hypersphere), PonderNet halting,
anytime contrastive (MNRL at every recursion step). Single tied recursive block (K=1, LMAX=6),
trained from scratch (no teacher) on English all-NLI (557k pairs); embeddings warm-started from
intfloat/multilingual-e5-small (multilingual XLM-R tokenizer). d=384.
Key property โ adaptive compute is ~free: the learned halting exits at E[depth]โ1.05, and depth-1 is the strongest exit; deeper recursion does not help (K=1 optimal).
Benchmarks (Spearman for STS; nDCG@10; accuracy)
| model | STSB | SICK-R | BIOSSES | SciFact nDCG@10 | Banking77 acc |
|---|---|---|---|---|---|
| OmniEM(depth1) | 0.7253 | 0.655 | 0.6865 | 0.3468 | 0.8521 |
| OmniEM(hard-exit) | 0.7165 | 0.6422 | 0.6865 | 0.3468 | 0.8521 |
| multilingual-e5-small | 0.8359 | 0.7863 | 0.8438 | 0.6694 | 0.8339 |
| all-MiniLM-L6-v2 | 0.8203 | 0.7758 | 0.8164 | 0.6451 | 0.9083 |
| bge-small-en-v1.5 | 0.8586 | 0.7941 | 0.8375 | 0.72 | 0.9064 |
Usage
See omniem_model.py (Student/GOATV/YatMLP). Load omniem_en_best.pt ({"model":state_dict,"config":...}),
warm tokenizer/embeddings from intfloat/multilingual-e5-small, mean-pool the depth-1 (or hard-exit) output, L2-normalize.
Trained on Kaggle 2xT4. Full benchmark JSON: omniem_benchmarks.json.