OmniEM-EN v1

Recursive kernel-native sentence embedder: GOAT-V (no Q/K) + per-head b/eps, Yat-MLP, geodesic-momentum skip (exp-map + parallel transport on the unit hypersphere), PonderNet halting, anytime contrastive (MNRL at every recursion step). Single tied recursive block (K=1, LMAX=6), trained from scratch (no teacher) on English all-NLI (557k pairs); embeddings warm-started from intfloat/multilingual-e5-small (multilingual XLM-R tokenizer). d=384.

Key property — adaptive compute is ~free: the learned halting exits at E[depth]≈1.05, and depth-1 is the strongest exit; deeper recursion does not help (K=1 optimal).

Benchmarks (Spearman for STS; nDCG@10; accuracy)

model	STSB	SICK-R	BIOSSES	SciFact nDCG@10	Banking77 acc
OmniEM(depth1)	0.7253	0.655	0.6865	0.3468	0.8521
OmniEM(hard-exit)	0.7165	0.6422	0.6865	0.3468	0.8521
multilingual-e5-small	0.8359	0.7863	0.8438	0.6694	0.8339
all-MiniLM-L6-v2	0.8203	0.7758	0.8164	0.6451	0.9083
bge-small-en-v1.5	0.8586	0.7941	0.8375	0.72	0.9064

Usage

See omniem_model.py (Student/GOATV/YatMLP). Load omniem_en_best.pt ({"model":state_dict,"config":...}), warm tokenizer/embeddings from intfloat/multilingual-e5-small, mean-pool the depth-1 (or hard-exit) output, L2-normalize.

Trained on Kaggle 2xT4. Full benchmark JSON: omniem_benchmarks.json.

Downloads last month: -; Downloads are not tracked for this model. How to track