Mosaic-glasses commited on
Commit
6c0a937
·
verified ·
1 Parent(s): 6d7e3da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -104,6 +104,48 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
104
  print(response)
105
  ```
106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ---
108
 
109
  ## 🖊️ Citation
@@ -124,5 +166,5 @@ If you find this model useful, please consider citing our work:
124
 
125
  ## 📬 Contact
126
 
127
- Yichen Yao ([eason.yyc@inftech.ai](mailto:eason.yyc@inftech.ai))
128
 
 
104
  print(response)
105
  ```
106
 
107
+ ---
108
+
109
+ ## Performance
110
+
111
+ **INF-X-Retriever** achieves state-of-the-art results on the [BRIGHT Benchmark](https://brightbenchmark.github.io/) (as of Dec 20, 2025).
112
+
113
+ The **BRIGHT** (Benchmark for Reasoning-Intensive Grounded HT) is a rigorous text retrieval benchmark designed to evaluate the capability of retrieval models in handling questions that require intensive reasoning and cross-document synthesis. Collected from real-world sources such as StackExchange, competitive programming platforms, and mathematical competitions, it comprises complex queries spanning diverse domains like mathematics, coding, biology, economics, and robotics.
114
+
115
+ ### Short document
116
+
117
+ #### Overall & Category Performance
118
+
119
+ | Model | **Avg ALL** | **StackExchange** | **Coding** | **Theorem-based** |
120
+ |:---|:---:|:---:|:---:|:---:|
121
+ | **INF-X-Retriever** | **63.4** | **68.3** | **55.3** | **57.7** |
122
+ | DIVER (v3) | 46.8 | 51.8 | 39.9 | 39.7 |
123
+ | BGE-Reasoner-0928 | 46.4 | 52.0 | 35.3 | 40.7 |
124
+ | LATTICE | 42.1 | 51.6 | 26.9 | 30.0 |
125
+ | ReasonRank | 40.8 | 46.9 | 27.6 | 35.5 |
126
+ | XDR2 | 40.3 | 47.1 | 28.5 | 32.1 |
127
+
128
+ #### Detailed Results Across 12 Datasets
129
+
130
+ | Model | Avg | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. |
131
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
132
+ | **INF-X-Retriever** | **63.4** | **79.8** | **70.9** | **69.9** | **73.3** | **57.7** | **64.3** | **61.9** | **56.1** | **54.5** | **51.9** | **53.1** | **67.9** |
133
+ | DIVER (v3) | 46.8 | 66.0 | 63.7 | 42.4 | 55.0 | 40.6 | 44.7 | 50.4 | 32.5 | 47.3 | 17.2 | 46.4 | 55.6 |
134
+ | BGE-Reasoner-0928 | 46.4 | 68.5 | 66.4 | 40.6 | 53.1 | 43.2 | 44.1 | 47.8 | 29.0 | 41.6 | 17.2 | 46.5 | 58.4 |
135
+ | LATTICE | 42.1 | 64.4 | 62.4 | 45.4 | 57.4 | 47.6 | 37.6 | 46.4 | 19.9 | 34.0 | 12.0 | 30.1 | 47.8 |
136
+ | ReasonRank | 40.8 | 62.7 | 55.5 | 36.7 | 54.6 | 35.7 | 38.0 | 44.8 | 29.5 | 25.6 | 14.4 | 42.0 | 50.1 |
137
+ | XDR2 | 40.3 | 63.1 | 55.4 | 38.5 | 52.9 | 37.1 | 38.2 | 44.6 | 21.9 | 35.0 | 15.7 | 34.4 | 46.2 |
138
+
139
+ ### Long document
140
+
141
+ #### Detailed Results Across 8 Datasets
142
+
143
+ | Model | Avg | Bio. | Earth. | Econ. | Pony | Psy. | Rob. | Stack. | Sus. |
144
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
145
+ | **INF-X-Retriever** | **54.6** | **73.2** | **59.6** | **69.3** | **12.1** | **74.3** | **55.9** | **27.8** | **64.8** |
146
+ | inf-retriever-v1-pro | 30.5 | 44.1 | 42.2 | 31.4 | 0.4 | 43.1 | 20.8 | 21.4 | 41.0 |
147
+
148
+
149
  ---
150
 
151
  ## 🖊️ Citation
 
166
 
167
  ## 📬 Contact
168
 
169
+ Email: [eason.yyc@inftech.ai](mailto:eason.yyc@inftech.ai)
170