Sentence Similarity
sentence-transformers
Safetensors
English
qwen3
mathematics
mathlib
lean4
retrieval
contrastive-learning
feature-extraction
loss:CachedMultipleNegativesRankingLoss
text-embeddings-inference
Instructions to use uw-math-ai/MathLeap-Octen-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use uw-math-ai/MathLeap-Octen-8B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("uw-math-ai/MathLeap-Octen-8B") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,20 +25,6 @@ concepts via multi-view contrastive learning. The goal is to embed
|
|
| 25 |
mathematically equivalent statements close together regardless of the surface
|
| 26 |
language they are written in.
|
| 27 |
|
| 28 |
-
On **AMP** β the Adversarial Theorem Pairs benchmark of 270 statement pairs
|
| 29 |
-
written in different mathematical sub-languages (e.g., set-theoretic vs.
|
| 30 |
-
category-theoretic phrasings of the same theorem) β MathLeap-Octen-8B
|
| 31 |
-
improves Recall@1 from 25.0 to 28.9 (+3.9) relative to its base. The gain
|
| 32 |
-
on MMR is smaller (0.42 β 0.43) since Octen-Embedding-8B is already a
|
| 33 |
-
strong starting point for this task.
|
| 34 |
-
|
| 35 |
-
| Model | AMP R@1 | AMP R@5 | AMP MMR β |
|
| 36 |
-
|---|---|---|---|
|
| 37 |
-
| Qwen3-Embedding-8B | 17.0 | 47.8 | 0.32 |
|
| 38 |
-
| MathLeap-Qwen-8B | 27.2 | 63.1 | 0.43 |
|
| 39 |
-
| Octen-Embedding-8B (base) | 25.0 | 62.8 | 0.42 |
|
| 40 |
-
| **MathLeap-Octen-8B (this)** | **28.9** | **63.7** | **0.43** |
|
| 41 |
-
|
| 42 |
A companion model fine-tuned from the original Qwen3-Embedding-8B under the
|
| 43 |
same recipe, [MathLeap-Qwen-8B](https://anonymous-hf.up.railway.app/a/pv25ongyl2qb/ )
|
| 44 |
is also released.
|
|
@@ -140,27 +126,6 @@ direction supervision.
|
|
| 140 |
|
| 141 |
## Evaluation
|
| 142 |
|
| 143 |
-
### AMP (Adversarial Theorem Pairs)
|
| 144 |
-
|
| 145 |
-
AMP pairs theorem statements across nine pairs of mathematical sub-languages
|
| 146 |
-
(e.g., vector spaces β module theory, set theory β category theory). Models
|
| 147 |
-
must retrieve the mathematically equivalent partner from a corpus where
|
| 148 |
-
surface lexical similarity is intentionally low.
|
| 149 |
-
|
| 150 |
-
**Specialized prompt:**
|
| 151 |
-
|
| 152 |
-
| Model | R@1 | R@3 | R@5 | R@10 | R@20 | MMR β | Mean rank β |
|
| 153 |
-
|---|---|---|---|---|---|---|---|
|
| 154 |
-
| Qwen3-Embedding-4B | 13.7 | 34.3 | 45.0 | 58.5 | 70.2 | 0.28 | 21.4 |
|
| 155 |
-
| Qwen3-Embedding-8B | 17.0 | 38.0 | 47.8 | 63.0 | 78.1 | 0.32 | 16.1 |
|
| 156 |
-
| harrier-oss-v1-27b | 19.4 | 37.0 | 48.3 | 63.9 | 78.5 | 0.33 | 15.2 |
|
| 157 |
-
| KaLM-Embedding-Gemma3-12B | 10.4 | 27.4 | 35.2 | 48.9 | 64.6 | 0.23 | 25.6 |
|
| 158 |
-
| llama-embed-nemotron-8b | 2.8 | 7.2 | 11.1 | 19.3 | 31.7 | 0.08 | 94.7 |
|
| 159 |
-
| Octen-Embedding-8B (base) | 25.0 | 50.6 | 62.8 | 77.4 | 88.9 | 0.42 | 10.3 |
|
| 160 |
-
| F2LLM-v2-14B | β | β | β | β | β | 0.37 | 10.2 |
|
| 161 |
-
| MathLeap-Qwen-8B | 27.2 | 52.2 | 63.1 | 78.0 | 89.8 | 0.43 | 9.7 |
|
| 162 |
-
| **MathLeap-Octen-8B (this)** | **28.9** | **55.0** | **63.7** | **77.2** | **89.8** | **0.43** | β |
|
| 163 |
-
|
| 164 |
### In-domain held-out FrenzyMath retrieval (six directions)
|
| 165 |
|
| 166 |
R@1 / MRR on the held-out test set (15,287 concepts):
|
|
|
|
| 25 |
mathematically equivalent statements close together regardless of the surface
|
| 26 |
language they are written in.
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
A companion model fine-tuned from the original Qwen3-Embedding-8B under the
|
| 29 |
same recipe, [MathLeap-Qwen-8B](https://anonymous-hf.up.railway.app/a/pv25ongyl2qb/ )
|
| 30 |
is also released.
|
|
|
|
| 126 |
|
| 127 |
## Evaluation
|
| 128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
### In-domain held-out FrenzyMath retrieval (six directions)
|
| 130 |
|
| 131 |
R@1 / MRR on the held-out test set (15,287 concepts):
|