uw-math-ai
/

MathLeap-Octen-8B

Sentence Similarity

sentence-transformers

contrastive-learning

feature-extraction

loss:CachedMultipleNegativesRankingLoss

text-embeddings-inference

Model card Files Files and versions

CeciliaYe commited on 5 days ago

Commit

f830056

·

verified ·

1 Parent(s): cceab05

Update README.md

Files changed (1) hide show

README.md +0 -35

README.md CHANGED Viewed

@@ -25,20 +25,6 @@ concepts via multi-view contrastive learning. The goal is to embed
 mathematically equivalent statements close together regardless of the surface
 language they are written in.
-On **AMP** — the Adversarial Theorem Pairs benchmark of 270 statement pairs
-written in different mathematical sub-languages (e.g., set-theoretic vs.
-category-theoretic phrasings of the same theorem) — MathLeap-Octen-8B
-improves Recall@1 from 25.0 to 28.9 (+3.9) relative to its base. The gain
-on MMR is smaller (0.42 → 0.43) since Octen-Embedding-8B is already a
-strong starting point for this task.
-| Model | AMP R@1 | AMP R@5 | AMP MMR ↑ |
-|---|---|---|---|
-| Qwen3-Embedding-8B | 17.0 | 47.8 | 0.32 |
-| MathLeap-Qwen-8B | 27.2 | 63.1 | 0.43 |
-| Octen-Embedding-8B (base) | 25.0 | 62.8 | 0.42 |
-| **MathLeap-Octen-8B (this)** | **28.9** | **63.7** | **0.43** |
 A companion model fine-tuned from the original Qwen3-Embedding-8B under the
 same recipe, [MathLeap-Qwen-8B](https://anonymous-hf.up.railway.app/a/pv25ongyl2qb/ )
 is also released.
@@ -140,27 +126,6 @@ direction supervision.
 ## Evaluation
-### AMP (Adversarial Theorem Pairs)
-AMP pairs theorem statements across nine pairs of mathematical sub-languages
-(e.g., vector spaces ↔ module theory, set theory ↔ category theory). Models
-must retrieve the mathematically equivalent partner from a corpus where
-surface lexical similarity is intentionally low.
-**Specialized prompt:**
-| Model | R@1 | R@3 | R@5 | R@10 | R@20 | MMR ↑ | Mean rank ↓ |
-|---|---|---|---|---|---|---|---|
-| Qwen3-Embedding-4B | 13.7 | 34.3 | 45.0 | 58.5 | 70.2 | 0.28 | 21.4 |
-| Qwen3-Embedding-8B | 17.0 | 38.0 | 47.8 | 63.0 | 78.1 | 0.32 | 16.1 |
-| harrier-oss-v1-27b | 19.4 | 37.0 | 48.3 | 63.9 | 78.5 | 0.33 | 15.2 |
-| KaLM-Embedding-Gemma3-12B | 10.4 | 27.4 | 35.2 | 48.9 | 64.6 | 0.23 | 25.6 |
-| llama-embed-nemotron-8b | 2.8 | 7.2 | 11.1 | 19.3 | 31.7 | 0.08 | 94.7 |
-| Octen-Embedding-8B (base) | 25.0 | 50.6 | 62.8 | 77.4 | 88.9 | 0.42 | 10.3 |
-| F2LLM-v2-14B | — | — | — | — | — | 0.37 | 10.2 |
-| MathLeap-Qwen-8B | 27.2 | 52.2 | 63.1 | 78.0 | 89.8 | 0.43 | 9.7 |
-| **MathLeap-Octen-8B (this)** | **28.9** | **55.0** | **63.7** | **77.2** | **89.8** | **0.43** | — |
 ### In-domain held-out FrenzyMath retrieval (six directions)
 R@1 / MRR on the held-out test set (15,287 concepts):

 mathematically equivalent statements close together regardless of the surface
 language they are written in.
 A companion model fine-tuned from the original Qwen3-Embedding-8B under the
 same recipe, [MathLeap-Qwen-8B](https://anonymous-hf.up.railway.app/a/pv25ongyl2qb/ )
 is also released.
 ## Evaluation
 ### In-domain held-out FrenzyMath retrieval (six directions)
 R@1 / MRR on the held-out test set (15,287 concepts):