| | --- |
| | license: apache-2.0 |
| | tags: |
| | - medical |
| | - code |
| | - math |
| | - reasoning |
| | - general |
| | datasets: |
| | - Raderspace/MATH_qCoT_LLMquery_questionasquery_lexicalquery |
| | - reasonir/reasonir-data |
| | - truehealth/medqa |
| | - AQ-MedAI/PRGB-ZH |
| | metrics: |
| | - accuracy |
| | - recall |
| | base_model: |
| | - Qwen/Qwen3-Embedding-4B |
| | pipeline_tag: text-ranking |
| | language: |
| | - zh |
| | - en |
| | library_name: transformers |
| | --- |
| | # Diver-Retriever-4B |
| |
|
| | ## HighLights |
| | The Diver Retriever 4B model is a reasoning-intensive model designed to tackle the challenge of reasonIR and rader. |
| | We combined data from the fields of mathematics, coding, and healthcare. |
| | At the same time, we made precise matching in terms of the difficulty level of the samples, and uniquely |
| | constructed negative samples corresponding to each field. Therefore, this model performs very well on the Bright LeaderBoard |
| | as well as the Mteb-Medical Benchmark. |
| |
|
| |
|
| |
|
| | ### Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| |
|
| | - **Model type:** Text Embedding |
| | - **Language(s) (NLP):** Bilingual (Chinese & English) |
| | - **Context Length:** 40k |
| | - **Number of Paramaters:** 4B |
| |
|
| | For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our GitHub (https://github.com/AQ-MedAI/Diver). |
| |
|
| |
|
| |
|
| | | **Model** | **#Total Params** | **Context Length** | **Download** | **BRIGHT** | |
| | | :------------------: | :---------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------: | |
| | | DIVER-Retriever-4B | 4B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B | **28.9** | |
| | | DIVER-Retriever-1.7B | 1.7B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-1.7B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-1.7B | **27.3** | |
| | | DIVER-Retriever-0.6B | 0.6B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-0.6B <br>[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-0.6B | **25.2** | |
| |
|
| |
|
| | ## Evaluation |
| |
|
| | <table> |
| | <thead> |
| | <tr> |
| | <th>Method</th> |
| | <th style="text-align:right">Avg.</th> |
| | <th style="text-align:right">Bio.</th> |
| | <th style="text-align:right">Earth.</th> |
| | <th style="text-align:right">Econ.</th> |
| | <th style="text-align:right">Psy.</th> |
| | <th style="text-align:right">Rob.</th> |
| | <th style="text-align:right">Stack.</th> |
| | <th style="text-align:right">Sus.</th> |
| | <th style="text-align:right">Leet.</th> |
| | <th style="text-align:right">Pony</th> |
| | <th style="text-align:right">AoPS</th> |
| | <th style="text-align:right">TheoQ.</th> |
| | <th style="text-align:right">TheoT.</th> |
| | </tr> |
| | </thead> |
| | <tbody> |
| | <tr> |
| | <td colspan=12 style="text-align:center"><strong>Evaluate Retriever with Original Query</strong></td> |
| | </tr> |
| | <tr> |
| | <td>BM25</td> |
| | <td style="text-align:right">14.5</td> |
| | <td style="text-align:right">18.9</td> |
| | <td style="text-align:right">27.2</td> |
| | <td style="text-align:right">14.9</td> |
| | <td style="text-align:right">12.5</td> |
| | <td style="text-align:right">13.6</td> |
| | <td style="text-align:right">18.4</td> |
| | <td style="text-align:right">15.0</td> |
| | <td style="text-align:right">24.4</td> |
| | <td style="text-align:right">7.9</td> |
| | <td style="text-align:right">6.2</td> |
| | <td style="text-align:right">10.4</td> |
| | <td style="text-align:right">4.9</td> |
| | </tr> |
| | <tr> |
| | <td>SBERT</td> |
| | <td style="text-align:right">14.9</td> |
| | <td style="text-align:right">15.1</td> |
| | <td style="text-align:right">20.4</td> |
| | <td style="text-align:right">16.6</td> |
| | <td style="text-align:right">22.7</td> |
| | <td style="text-align:right">8.2</td> |
| | <td style="text-align:right">11.0</td> |
| | <td style="text-align:right">15.3</td> |
| | <td style="text-align:right">26.4</td> |
| | <td style="text-align:right">7.0</td> |
| | <td style="text-align:right">5.3</td> |
| | <td style="text-align:right">20.0</td> |
| | <td style="text-align:right">10.8</td> |
| | </tr> |
| | <tr> |
| | <td>gte-Qwen1.5-7B</td> |
| | <td style="text-align:right">22.5</td> |
| | <td style="text-align:right">30.6</td> |
| | <td style="text-align:right">36.4</td> |
| | <td style="text-align:right">17.8</td> |
| | <td style="text-align:right">24.6</td> |
| | <td style="text-align:right">13.2</td> |
| | <td style="text-align:right">22.2</td> |
| | <td style="text-align:right">14.8</td> |
| | <td style="text-align:right">25.5</td> |
| | <td style="text-align:right">9.9</td> |
| | <td style="text-align:right">14.4</td> |
| | <td style="text-align:right">27.8</td> |
| | <td style="text-align:right">32.9</td> |
| | </tr> |
| | <tr> |
| | <td>Qwen3-4B</td> |
| | <td style="text-align:right">5.6</td> |
| | <td style="text-align:right">3.5</td> |
| | <td style="text-align:right">8.0</td> |
| | <td style="text-align:right">2.3</td> |
| | <td style="text-align:right">2.0</td> |
| | <td style="text-align:right">1.6</td> |
| | <td style="text-align:right">1.0</td> |
| | <td style="text-align:right">4.4</td> |
| | <td style="text-align:right">2.1</td> |
| | <td style="text-align:right">0.1</td> |
| | <td style="text-align:right">4.9</td> |
| | <td style="text-align:right">18.0</td> |
| | <td style="text-align:right">19.2</td> |
| | </tr> |
| | <tr> |
| | <td>OpenAI</td> |
| | <td style="text-align:right">17.9</td> |
| | <td style="text-align:right">23.3</td> |
| | <td style="text-align:right">26.7</td> |
| | <td style="text-align:right">19.5</td> |
| | <td style="text-align:right">27.6</td> |
| | <td style="text-align:right">12.8</td> |
| | <td style="text-align:right">14.3</td> |
| | <td style="text-align:right">20.5</td> |
| | <td style="text-align:right">23.6</td> |
| | <td style="text-align:right">2.4</td> |
| | <td style="text-align:right">8.5</td> |
| | <td style="text-align:right">23.5</td> |
| | <td style="text-align:right">11.7</td> |
| | </tr> |
| | <tr> |
| | <td>Google</td> |
| | <td style="text-align:right">20.0</td> |
| | <td style="text-align:right">22.7</td> |
| | <td style="text-align:right">34.8</td> |
| | <td style="text-align:right">19.6</td> |
| | <td style="text-align:right">27.8</td> |
| | <td style="text-align:right">15.7</td> |
| | <td style="text-align:right">20.1</td> |
| | <td style="text-align:right">17.1</td> |
| | <td style="text-align:right">29.6</td> |
| | <td style="text-align:right">3.6</td> |
| | <td style="text-align:right">9.3</td> |
| | <td style="text-align:right">23.8</td> |
| | <td style="text-align:right">15.9</td> |
| | </tr> |
| | <tr> |
| | <td>ReasonIR-8B</td> |
| | <td style="text-align:right">24.4</td> |
| | <td style="text-align:right">26.2</td> |
| | <td style="text-align:right">31.4</td> |
| | <td style="text-align:right">23.3</td> |
| | <td style="text-align:right">30.0</td> |
| | <td style="text-align:right">18.0</td> |
| | <td style="text-align:right"><strong>23.9</strong></td> |
| | <td style="text-align:right">20.5</td> |
| | <td style="text-align:right">35.0</td> |
| | <td style="text-align:right">10.5</td> |
| | <td style="text-align:right"><strong>14.7</strong></td> |
| | <td style="text-align:right">31.9</td> |
| | <td style="text-align:right">27.2</td> |
| | </tr> |
| | <tr> |
| | <td>RaDeR-7B</td> |
| | <td style="text-align:right">25.5</td> |
| | <td style="text-align:right">34.6</td> |
| | <td style="text-align:right">38.9</td> |
| | <td style="text-align:right">22.1</td> |
| | <td style="text-align:right">33.0</td> |
| | <td style="text-align:right">14.8</td> |
| | <td style="text-align:right">22.5</td> |
| | <td style="text-align:right">23.7</td> |
| | <td style="text-align:right">37.3</td> |
| | <td style="text-align:right">5.0</td> |
| | <td style="text-align:right">10.2</td> |
| | <td style="text-align:right">28.4</td> |
| | <td style="text-align:right">35.1</td> |
| | </tr> |
| | <tr> |
| | <td>Seed1.5-Embedding</td> |
| | <td style="text-align:right">27.2</td> |
| | <td style="text-align:right">34.8</td> |
| | <td style="text-align:right"><strong>46.9</strong></td> |
| | <td style="text-align:right"><strong>23.4</strong></td> |
| | <td style="text-align:right">31.6</td> |
| | <td style="text-align:right">19.1</td> |
| | <td style="text-align:right">25.4</td> |
| | <td style="text-align:right">21.0</td> |
| | <td style="text-align:right"><strong>43.2</strong></td> |
| | <td style="text-align:right">4.9</td> |
| | <td style="text-align:right">12.2</td> |
| | <td style="text-align:right">33.3</td> |
| | <td style="text-align:right">30.5</td> |
| | </tr> |
| | <tr> |
| | <td>DIVER-Retriever</td> |
| | <td style="text-align:right"><strong>28.9</strong></td> |
| | <td style="text-align:right"><strong>41.8</strong></td> |
| | <td style="text-align:right">43.7</td> |
| | <td style="text-align:right">21.7</td> |
| | <td style="text-align:right"><strong>35.3</strong></td> |
| | <td style="text-align:right"><strong>21.0</strong></td> |
| | <td style="text-align:right">21.2</td> |
| | <td style="text-align:right"><strong>25.1</strong></td> |
| | <td style="text-align:right">37.6</td> |
| | <td style="text-align:right"><strong>13.2</strong></td> |
| | <td style="text-align:right">10.7</td> |
| | <td style="text-align:right"><strong>38.4</strong></td> |
| | <td style="text-align:right"><strong>37.3</strong></td> |
| | </tr> |
| | <tr> |
| | <td colspan=12 style="text-align:center"><strong>Evaluate Retriever with GPT-4 REASON-query</strong></td> |
| | </tr> |
| | <tr> |
| | <td>BM25</td> |
| | <td style="text-align:right">27.0</td> |
| | <td style="text-align:right"><strong>53.6</strong></td> |
| | <td style="text-align:right"><strong>54.1</strong></td> |
| | <td style="text-align:right">24.3</td> |
| | <td style="text-align:right">38.7</td> |
| | <td style="text-align:right">18.9</td> |
| | <td style="text-align:right">27.7</td> |
| | <td style="text-align:right">26.3</td> |
| | <td style="text-align:right">19.3</td> |
| | <td style="text-align:right">17.6</td> |
| | <td style="text-align:right">3.9</td> |
| | <td style="text-align:right">19.2</td> |
| | <td style="text-align:right">20.8</td> |
| | </tr> |
| | <tr> |
| | <td>SBERT</td> |
| | <td style="text-align:right">17.8</td> |
| | <td style="text-align:right">18.5</td> |
| | <td style="text-align:right">26.3</td> |
| | <td style="text-align:right">17.5</td> |
| | <td style="text-align:right">27.2</td> |
| | <td style="text-align:right">8.8</td> |
| | <td style="text-align:right">11.8</td> |
| | <td style="text-align:right">17.5</td> |
| | <td style="text-align:right">24.3</td> |
| | <td style="text-align:right">10.3</td> |
| | <td style="text-align:right">5.0</td> |
| | <td style="text-align:right">22.3</td> |
| | <td style="text-align:right">23.5</td> |
| | </tr> |
| | <tr> |
| | <td>gte-Qwen1.5-7B</td> |
| | <td style="text-align:right">24.8</td> |
| | <td style="text-align:right">35.5</td> |
| | <td style="text-align:right">43.1</td> |
| | <td style="text-align:right">24.3</td> |
| | <td style="text-align:right">34.3</td> |
| | <td style="text-align:right">15.4</td> |
| | <td style="text-align:right">22.9</td> |
| | <td style="text-align:right">23.9</td> |
| | <td style="text-align:right">25.4</td> |
| | <td style="text-align:right">5.2</td> |
| | <td style="text-align:right">4.6</td> |
| | <td style="text-align:right">28.7</td> |
| | <td style="text-align:right">34.6</td> |
| | </tr> |
| | <tr> |
| | <td>Qwen3-4B</td> |
| | <td style="text-align:right">5.5</td> |
| | <td style="text-align:right">1.3</td> |
| | <td style="text-align:right">17.3</td> |
| | <td style="text-align:right">2.5</td> |
| | <td style="text-align:right">6.2</td> |
| | <td style="text-align:right">1.0</td> |
| | <td style="text-align:right">4.8</td> |
| | <td style="text-align:right">4.5</td> |
| | <td style="text-align:right">3.0</td> |
| | <td style="text-align:right">5.9</td> |
| | <td style="text-align:right">0.0</td> |
| | <td style="text-align:right">7.2</td> |
| | <td style="text-align:right">12.5</td> |
| | </tr> |
| | <tr> |
| | <td>OpenAI</td> |
| | <td style="text-align:right">23.3</td> |
| | <td style="text-align:right">35.2</td> |
| | <td style="text-align:right">40.1</td> |
| | <td style="text-align:right">25.1</td> |
| | <td style="text-align:right">38.0</td> |
| | <td style="text-align:right">13.6</td> |
| | <td style="text-align:right">18.2</td> |
| | <td style="text-align:right">24.2</td> |
| | <td style="text-align:right">24.5</td> |
| | <td style="text-align:right">6.5</td> |
| | <td style="text-align:right">7.7</td> |
| | <td style="text-align:right">22.9</td> |
| | <td style="text-align:right">23.8</td> |
| | </tr> |
| | <tr> |
| | <td>Google</td> |
| | <td style="text-align:right">26.2</td> |
| | <td style="text-align:right">36.4</td> |
| | <td style="text-align:right">45.6</td> |
| | <td style="text-align:right">25.6</td> |
| | <td style="text-align:right">38.2</td> |
| | <td style="text-align:right">18.7</td> |
| | <td style="text-align:right"><strong>29.5</strong></td> |
| | <td style="text-align:right">17.9</td> |
| | <td style="text-align:right">31.1</td> |
| | <td style="text-align:right">3.7</td> |
| | <td style="text-align:right">10.0</td> |
| | <td style="text-align:right">27.8</td> |
| | <td style="text-align:right">30.4</td> |
| | </tr> |
| | <tr> |
| | <td>ReasonIR-8B</td> |
| | <td style="text-align:right">29.9</td> |
| | <td style="text-align:right">43.6</td> |
| | <td style="text-align:right">42.9</td> |
| | <td style="text-align:right"><strong>32.7</strong></td> |
| | <td style="text-align:right">38.8</td> |
| | <td style="text-align:right">20.9</td> |
| | <td style="text-align:right">25.8</td> |
| | <td style="text-align:right"><strong>27.5</strong></td> |
| | <td style="text-align:right">31.5</td> |
| | <td style="text-align:right"><strong>19.6</strong></td> |
| | <td style="text-align:right">7.4</td> |
| | <td style="text-align:right">33.1</td> |
| | <td style="text-align:right">35.7</td> |
| | </tr> |
| | <tr> |
| | <td>RaDeR-7B</td> |
| | <td style="text-align:right">29.2</td> |
| | <td style="text-align:right">36.1</td> |
| | <td style="text-align:right">42.9</td> |
| | <td style="text-align:right">25.2</td> |
| | <td style="text-align:right">37.9</td> |
| | <td style="text-align:right">16.6</td> |
| | <td style="text-align:right">27.4</td> |
| | <td style="text-align:right">25.0</td> |
| | <td style="text-align:right"><strong>34.8</strong></td> |
| | <td style="text-align:right">11.9</td> |
| | <td style="text-align:right"><strong>12.0</strong></td> |
| | <td style="text-align:right">37.7</td> |
| | <td style="text-align:right"><strong>43.4</strong></td> |
| | </tr> |
| | <tr> |
| | <td>DIVER-Retriever</td> |
| | <td style="text-align:right"><strong>32.1</strong></td> |
| | <td style="text-align:right">51.9</td> |
| | <td style="text-align:right">53.5</td> |
| | <td style="text-align:right">29.5</td> |
| | <td style="text-align:right"><strong>41.2</strong></td> |
| | <td style="text-align:right"><strong>21.4</strong></td> |
| | <td style="text-align:right">27.5</td> |
| | <td style="text-align:right">26.1</td> |
| | <td style="text-align:right">33.5</td> |
| | <td style="text-align:right">11.7</td> |
| | <td style="text-align:right">9.5</td> |
| | <td style="text-align:right"><strong>39.3</strong></td> |
| | <td style="text-align:right">39.7</td> |
| | </tr> |
| | <tr> |
| | <td colspan=12 style="text-align:center"><strong>Evaluate retriever with DIVER-QExpand query</strong></td> |
| | </tr> |
| | <tr> |
| | <td>ReasonIR-8B</td> |
| | <td style="text-align:right">32.6</td> |
| | <td style="text-align:right">49.4</td> |
| | <td style="text-align:right">44.7</td> |
| | <td style="text-align:right">32.4</td> |
| | <td style="text-align:right">44.0</td> |
| | <td style="text-align:right">26.6</td> |
| | <td style="text-align:right">31.8</td> |
| | <td style="text-align:right">29.0</td> |
| | <td style="text-align:right">32.3</td> |
| | <td style="text-align:right">12.8</td> |
| | <td style="text-align:right">9.1</td> |
| | <td style="text-align:right"><strong>40.7</strong></td> |
| | <td style="text-align:right">38.4</td> |
| | </tr> |
| | <tr> |
| | <td>+BM25 (Hybrid)</td> |
| | <td style="text-align:right">35.7</td> |
| | <td style="text-align:right">56.8</td> |
| | <td style="text-align:right">53.5</td> |
| | <td style="text-align:right"><strong>33.0</strong></td> |
| | <td style="text-align:right"><strong>48.5</strong></td> |
| | <td style="text-align:right"><strong>29.4</strong></td> |
| | <td style="text-align:right"><strong>34.2</strong></td> |
| | <td style="text-align:right"><strong>32.0</strong></td> |
| | <td style="text-align:right"><strong>35.2</strong></td> |
| | <td style="text-align:right">16.8</td> |
| | <td style="text-align:right">12.9</td> |
| | <td style="text-align:right">39.3</td> |
| | <td style="text-align:right">36.8</td> |
| | </tr> |
| | <tr> |
| | <td>DIVER-Retriever</td> |
| | <td style="text-align:right"><strong>33.9</strong></td> |
| | <td style="text-align:right">54.5</td> |
| | <td style="text-align:right">52.7</td> |
| | <td style="text-align:right">28.8</td> |
| | <td style="text-align:right">44.9</td> |
| | <td style="text-align:right">25.1</td> |
| | <td style="text-align:right">27.4</td> |
| | <td style="text-align:right">29.5</td> |
| | <td style="text-align:right">34.5</td> |
| | <td style="text-align:right">10.0</td> |
| | <td style="text-align:right">14.5</td> |
| | <td style="text-align:right"><strong>40.7</strong></td> |
| | <td style="text-align:right">44.7</td> |
| | </tr> |
| | <tr> |
| | <td>+BM25 (Hybrid)</td> |
| | <td style="text-align:right"><strong>37.2</strong></td> |
| | <td style="text-align:right"><strong>60.0</strong></td> |
| | <td style="text-align:right"><strong>55.9</strong></td> |
| | <td style="text-align:right">31.8</td> |
| | <td style="text-align:right">47.9</td> |
| | <td style="text-align:right">27.1</td> |
| | <td style="text-align:right">33.9</td> |
| | <td style="text-align:right">31.9</td> |
| | <td style="text-align:right">35.1</td> |
| | <td style="text-align:right"><strong>23.1</strong></td> |
| | <td style="text-align:right"><strong>16.8</strong></td> |
| | <td style="text-align:right">36.9</td> |
| | <td style="text-align:right"><strong>46.6</strong></td> |
| | </tr> |
| | </tbody> |
| | </table> |
| | |
| |
|
| | ## Usage |
| |
|
| | <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
| |
|
| | ### Inference |
| |
|
| | #### Sentence Transformers Usage |
| |
|
| | ```bash |
| | # Requires transformers>=4.51.0 |
| | # Requires sentence-transformers>=2.7.0 |
| | |
| | |
| | from sentence_transformers import SentenceTransformer |
| | |
| | # Load the model |
| | model = SentenceTransformer("AQ-MedAI/Diver-Retriever-4B") |
| | |
| | |
| | # The queries and documents to embed |
| | queries = [ |
| | "What is the capital of China?", |
| | "Explain gravity", |
| | ] |
| | documents = [ |
| | "The capital of China is Beijing.", |
| | "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.", |
| | ] |
| | |
| | # Encode the queries and documents. Note that queries benefit from using a prompt |
| | # Here we use the prompt called "query" stored under `model.prompts`, but you can |
| | # also pass your own prompt via the `prompt` argument |
| | query_embeddings = model.encode(queries, prompt_name="query") |
| | document_embeddings = model.encode(documents) |
| | |
| | # Compute the (cosine) similarity between the query and document embeddings |
| | similarity = model.similarity(query_embeddings, document_embeddings) |
| | print(similarity) |
| | |
| | ``` |
| | #### Transformers Usage |
| |
|
| | ```bash |
| | # Requires transformers>=4.51.0 |
| | import torch |
| | import torch.nn.functional as F |
| | |
| | from torch import Tensor |
| | from transformers import AutoTokenizer, AutoModel |
| | |
| | |
| | def last_token_pool(last_hidden_states: Tensor, |
| | attention_mask: Tensor) -> Tensor: |
| | left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0]) |
| | if left_padding: |
| | return last_hidden_states[:, -1] |
| | else: |
| | sequence_lengths = attention_mask.sum(dim=1) - 1 |
| | batch_size = last_hidden_states.shape[0] |
| | return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths] |
| | |
| | |
| | def get_detailed_instruct(task_description: str, query: str) -> str: |
| | return f'Instruct: {task_description}\nQuery:{query}' |
| | |
| | # Each query must come with a one-sentence instruction that describes the task |
| | task = 'Given a web search query, retrieve relevant passages that answer the query' |
| | |
| | queries = [ |
| | get_detailed_instruct(task, 'What is the capital of China?'), |
| | get_detailed_instruct(task, 'Explain gravity') |
| | ] |
| | # No need to add instructions for retrieval documents |
| | documents = [ |
| | "The capital of China is Beijing.", |
| | "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun." |
| | ] |
| | input_texts = queries + documents |
| | |
| | tokenizer = AutoTokenizer.from_pretrained('AQ-MedAI/Diver-Retriever-4B', padding_side='left') |
| | model = AutoModel.from_pretrained('AQ-MedAI/Diver-Retriever-4B') |
| | |
| | |
| | max_length = 8192 |
| | |
| | # Tokenize the input texts |
| | batch_dict = tokenizer( |
| | input_texts, |
| | padding=True, |
| | truncation=True, |
| | max_length=max_length, |
| | return_tensors="pt", |
| | ) |
| | batch_dict.to(model.device) |
| | outputs = model(**batch_dict) |
| | embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask']) |
| | |
| | # normalize embeddings |
| | embeddings = F.normalize(embeddings, p=2, dim=1) |
| | scores = (embeddings[:2] @ embeddings[2:].T) |
| | print(scores.tolist()) |
| | # [[0.9319270849227905, 0.5878604054450989], [0.639923095703125, 0.7950234413146973]] |
| | |
| | ``` |
| |
|
| |
|
| |
|
| | ### Finetuning |
| | We recommend you to use [swift](https://github.com/modelscope/ms-swift) to finetune our DIVER-Retriever-4B with infonce. |
| |
|
| | Before starting training, please ensure your environment is properly configured. |
| |
|
| | ```bash |
| | pip install ms-swift -U |
| | # Install from source |
| | pip install git+https://github.com/modelscope/ms-swift.git |
| | |
| | pip install transformers -U |
| | |
| | # Optional packages |
| | pip install deepspeed # multi-GPU training |
| | pip install liger-kernel # save GPU memory resources |
| | pip install flash-attn --no-build-isolation |
| | ``` |
| |
|
| | #### Training Command |
| |
|
| | Using the infonce loss as an example, the complete training command is as follows: |
| |
|
| | ```bash |
| | nproc_per_node=8 |
| | NPROC_PER_NODE=$nproc_per_node \ |
| | swift sft \ |
| | --model DIVER/DIVER-Retriever-4B \ |
| | --task_type embedding \ |
| | --model_type qwen3_emb \ |
| | --train_type full \ |
| | --dataset your_dataset \ |
| | --split_dataset_ratio 0.05 \ |
| | --eval_strategy steps \ |
| | --output_dir output \ |
| | --eval_steps 20 \ |
| | --num_train_epochs 5 \ |
| | --save_steps 20 \ |
| | --per_device_train_batch_size 4 \ |
| | --per_device_eval_batch_size 4 \ |
| | --gradient_accumulation_steps 4 \ |
| | --learning_rate 6e-6 \ |
| | --loss_type infonce \ |
| | --label_names labels \ |
| | --dataloader_drop_last true \ |
| | --deepspeed zero3 |
| | ``` |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| | ## Citation |
| |
|
| | <!-- If a paper or blog post is introducing the model, the APA and BibTeX information for that should go in this section. --> |
| | If you find our work helpful, feel free to cite it. |
| |
|
| | ``` |
| | @misc{long2025divermultistageapproachreasoningintensive, |
| | title={DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval}, |
| | author={Meixiu Long and Duolin Sun and Dan Yang and Junjie Wang and Yue Shen and Jian Wang and Peng Wei and Jinjie Gu and Jiahai Wang}, |
| | year={2025}, |
| | eprint={2508.07995}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.IR}, |
| | url={https://arxiv.org/abs/2508.07995}, |
| | } |
| | ``` |