Update README.md
Browse files
README.md
CHANGED
|
@@ -104,6 +104,48 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
| 104 |
print(response)
|
| 105 |
```
|
| 106 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
---
|
| 108 |
|
| 109 |
## 🖊️ Citation
|
|
@@ -124,5 +166,5 @@ If you find this model useful, please consider citing our work:
|
|
| 124 |
|
| 125 |
## 📬 Contact
|
| 126 |
|
| 127 |
-
|
| 128 |
|
|
|
|
| 104 |
print(response)
|
| 105 |
```
|
| 106 |
|
| 107 |
+
---
|
| 108 |
+
|
| 109 |
+
## Performance
|
| 110 |
+
|
| 111 |
+
**INF-X-Retriever** achieves state-of-the-art results on the [BRIGHT Benchmark](https://brightbenchmark.github.io/) (as of Dec 20, 2025).
|
| 112 |
+
|
| 113 |
+
The **BRIGHT** (Benchmark for Reasoning-Intensive Grounded HT) is a rigorous text retrieval benchmark designed to evaluate the capability of retrieval models in handling questions that require intensive reasoning and cross-document synthesis. Collected from real-world sources such as StackExchange, competitive programming platforms, and mathematical competitions, it comprises complex queries spanning diverse domains like mathematics, coding, biology, economics, and robotics.
|
| 114 |
+
|
| 115 |
+
### Short document
|
| 116 |
+
|
| 117 |
+
#### Overall & Category Performance
|
| 118 |
+
|
| 119 |
+
| Model | **Avg ALL** | **StackExchange** | **Coding** | **Theorem-based** |
|
| 120 |
+
|:---|:---:|:---:|:---:|:---:|
|
| 121 |
+
| **INF-X-Retriever** | **63.4** | **68.3** | **55.3** | **57.7** |
|
| 122 |
+
| DIVER (v3) | 46.8 | 51.8 | 39.9 | 39.7 |
|
| 123 |
+
| BGE-Reasoner-0928 | 46.4 | 52.0 | 35.3 | 40.7 |
|
| 124 |
+
| LATTICE | 42.1 | 51.6 | 26.9 | 30.0 |
|
| 125 |
+
| ReasonRank | 40.8 | 46.9 | 27.6 | 35.5 |
|
| 126 |
+
| XDR2 | 40.3 | 47.1 | 28.5 | 32.1 |
|
| 127 |
+
|
| 128 |
+
#### Detailed Results Across 12 Datasets
|
| 129 |
+
|
| 130 |
+
| Model | Avg | Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. |
|
| 131 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 132 |
+
| **INF-X-Retriever** | **63.4** | **79.8** | **70.9** | **69.9** | **73.3** | **57.7** | **64.3** | **61.9** | **56.1** | **54.5** | **51.9** | **53.1** | **67.9** |
|
| 133 |
+
| DIVER (v3) | 46.8 | 66.0 | 63.7 | 42.4 | 55.0 | 40.6 | 44.7 | 50.4 | 32.5 | 47.3 | 17.2 | 46.4 | 55.6 |
|
| 134 |
+
| BGE-Reasoner-0928 | 46.4 | 68.5 | 66.4 | 40.6 | 53.1 | 43.2 | 44.1 | 47.8 | 29.0 | 41.6 | 17.2 | 46.5 | 58.4 |
|
| 135 |
+
| LATTICE | 42.1 | 64.4 | 62.4 | 45.4 | 57.4 | 47.6 | 37.6 | 46.4 | 19.9 | 34.0 | 12.0 | 30.1 | 47.8 |
|
| 136 |
+
| ReasonRank | 40.8 | 62.7 | 55.5 | 36.7 | 54.6 | 35.7 | 38.0 | 44.8 | 29.5 | 25.6 | 14.4 | 42.0 | 50.1 |
|
| 137 |
+
| XDR2 | 40.3 | 63.1 | 55.4 | 38.5 | 52.9 | 37.1 | 38.2 | 44.6 | 21.9 | 35.0 | 15.7 | 34.4 | 46.2 |
|
| 138 |
+
|
| 139 |
+
### Long document
|
| 140 |
+
|
| 141 |
+
#### Detailed Results Across 8 Datasets
|
| 142 |
+
|
| 143 |
+
| Model | Avg | Bio. | Earth. | Econ. | Pony | Psy. | Rob. | Stack. | Sus. |
|
| 144 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 145 |
+
| **INF-X-Retriever** | **54.6** | **73.2** | **59.6** | **69.3** | **12.1** | **74.3** | **55.9** | **27.8** | **64.8** |
|
| 146 |
+
| inf-retriever-v1-pro | 30.5 | 44.1 | 42.2 | 31.4 | 0.4 | 43.1 | 20.8 | 21.4 | 41.0 |
|
| 147 |
+
|
| 148 |
+
|
| 149 |
---
|
| 150 |
|
| 151 |
## 🖊️ Citation
|
|
|
|
| 166 |
|
| 167 |
## 📬 Contact
|
| 168 |
|
| 169 |
+
Email: [eason.yyc@inftech.ai](mailto:eason.yyc@inftech.ai)
|
| 170 |
|