CeciliaYe commited on
Commit
f830056
Β·
verified Β·
1 Parent(s): cceab05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -35
README.md CHANGED
@@ -25,20 +25,6 @@ concepts via multi-view contrastive learning. The goal is to embed
25
  mathematically equivalent statements close together regardless of the surface
26
  language they are written in.
27
 
28
- On **AMP** β€” the Adversarial Theorem Pairs benchmark of 270 statement pairs
29
- written in different mathematical sub-languages (e.g., set-theoretic vs.
30
- category-theoretic phrasings of the same theorem) β€” MathLeap-Octen-8B
31
- improves Recall@1 from 25.0 to 28.9 (+3.9) relative to its base. The gain
32
- on MMR is smaller (0.42 β†’ 0.43) since Octen-Embedding-8B is already a
33
- strong starting point for this task.
34
-
35
- | Model | AMP R@1 | AMP R@5 | AMP MMR ↑ |
36
- |---|---|---|---|
37
- | Qwen3-Embedding-8B | 17.0 | 47.8 | 0.32 |
38
- | MathLeap-Qwen-8B | 27.2 | 63.1 | 0.43 |
39
- | Octen-Embedding-8B (base) | 25.0 | 62.8 | 0.42 |
40
- | **MathLeap-Octen-8B (this)** | **28.9** | **63.7** | **0.43** |
41
-
42
  A companion model fine-tuned from the original Qwen3-Embedding-8B under the
43
  same recipe, [MathLeap-Qwen-8B](https://anonymous-hf.up.railway.app/a/pv25ongyl2qb/ )
44
  is also released.
@@ -140,27 +126,6 @@ direction supervision.
140
 
141
  ## Evaluation
142
 
143
- ### AMP (Adversarial Theorem Pairs)
144
-
145
- AMP pairs theorem statements across nine pairs of mathematical sub-languages
146
- (e.g., vector spaces ↔ module theory, set theory ↔ category theory). Models
147
- must retrieve the mathematically equivalent partner from a corpus where
148
- surface lexical similarity is intentionally low.
149
-
150
- **Specialized prompt:**
151
-
152
- | Model | R@1 | R@3 | R@5 | R@10 | R@20 | MMR ↑ | Mean rank ↓ |
153
- |---|---|---|---|---|---|---|---|
154
- | Qwen3-Embedding-4B | 13.7 | 34.3 | 45.0 | 58.5 | 70.2 | 0.28 | 21.4 |
155
- | Qwen3-Embedding-8B | 17.0 | 38.0 | 47.8 | 63.0 | 78.1 | 0.32 | 16.1 |
156
- | harrier-oss-v1-27b | 19.4 | 37.0 | 48.3 | 63.9 | 78.5 | 0.33 | 15.2 |
157
- | KaLM-Embedding-Gemma3-12B | 10.4 | 27.4 | 35.2 | 48.9 | 64.6 | 0.23 | 25.6 |
158
- | llama-embed-nemotron-8b | 2.8 | 7.2 | 11.1 | 19.3 | 31.7 | 0.08 | 94.7 |
159
- | Octen-Embedding-8B (base) | 25.0 | 50.6 | 62.8 | 77.4 | 88.9 | 0.42 | 10.3 |
160
- | F2LLM-v2-14B | β€” | β€” | β€” | β€” | β€” | 0.37 | 10.2 |
161
- | MathLeap-Qwen-8B | 27.2 | 52.2 | 63.1 | 78.0 | 89.8 | 0.43 | 9.7 |
162
- | **MathLeap-Octen-8B (this)** | **28.9** | **55.0** | **63.7** | **77.2** | **89.8** | **0.43** | β€” |
163
-
164
  ### In-domain held-out FrenzyMath retrieval (six directions)
165
 
166
  R@1 / MRR on the held-out test set (15,287 concepts):
 
25
  mathematically equivalent statements close together regardless of the surface
26
  language they are written in.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  A companion model fine-tuned from the original Qwen3-Embedding-8B under the
29
  same recipe, [MathLeap-Qwen-8B](https://anonymous-hf.up.railway.app/a/pv25ongyl2qb/ )
30
  is also released.
 
126
 
127
  ## Evaluation
128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ### In-domain held-out FrenzyMath retrieval (six directions)
130
 
131
  R@1 / MRR on the held-out test set (15,287 concepts):