ForwardAILabs
/

MRE-T1

@@ -75,32 +75,32 @@ MRE-T1 achieves state-of-the-art single-model performance on the [BRIGHT benchma
 | Task | MRE-T1 |
 |------|--------|
-| Biology | 81.2 |
-| Earth Science | 73.2 |
-| Economics | 64.8 |
-| Psychology | 72.5 |
-| Robotics | 57.3 |
-| StackOverflow | 62.8 |
-| Sustainable Living | 60.7 |
-| LeetCode | 40.6 |
-| Pony | 70.2 |
-| AOPS | 39.5 |
-| TheoremQA (Questions) | 56.2 |
-| TheoremQA (Theorems) | 66.3 |
 | **Average** | **39.6** |
 ### Long Document Retrieval (nDCG@10)
 | Task | MRE-T1 |
 |------|--------|
-| Biology | 91.5 |
-| Earth Science | 84.7 |
-| Economics | 82.0 |
-| Psychology | 84.9 |
-| Robotics | 70.1 |
-| StackOverflow | 65.9 |
-| Sustainable Living | 84.7 |
-| Pony | 51.3 |
 | **Average** | **35.1** |
 ### Comparison with Other Models (Short, Single Model Only)

 | Task | MRE-T1 |
 |------|--------|
+| Biology | 55.3 |
+| Earth Science | 56.5 |
+| Economics | 32.9 |
+| Psychology | 48.2 |
+| Robotics | 33.1 |
+| StackOverflow | 34.2 |
+| Sustainable Living | 37.3 |
+| LeetCode | 35.0 |
+| Pony | 35.5 |
+| AOPS | 16.7 |
+| TheoremQA (Questions) | 43.3 |
+| TheoremQA (Theorems) | 46.9 |
 | **Average** | **39.6** |
 ### Long Document Retrieval (nDCG@10)
 | Task | MRE-T1 |
 |------|--------|
+| Biology | 74.2 |
+| Earth Science | 72.2 |
+| Economics | 57.3 |
+| Psychology | 71.3 |
+| Robotics | 51.6 |
+| StackOverflow | 51.4 |
+| Sustainable Living | 66.2 |
+| Pony | 33.9 |
 | **Average** | **35.1** |
 ### Comparison with Other Models (Short, Single Model Only)