--- license: apache-2.0 tags: - medical - code - math - reasoning - general datasets: - Raderspace/MATH_qCoT_LLMquery_questionasquery_lexicalquery - reasonir/reasonir-data - truehealth/medqa - AQ-MedAI/PRGB-ZH metrics: - accuracy pipeline_tag: text-ranking language: - zh - en library_name: transformers base_model: - Qwen/Qwen3-Embedding-0.6B --- # Diver-Retriever-0.6B ## HighLights The Diver Retriever 0.6B model is a reasoning-intensive model designed to tackle the challenge of reasonIR and rader. We combined data from the fields of mathematics, coding, and healthcare. At the same time, we made precise matching in terms of the difficulty level of the samples, and uniquely constructed negative samples corresponding to each field. Therefore, this model performs very well on the Bright LeaderBoard as well as the Mteb-Medical Benchmark. Its quantize model has been downloaded **1.4k+** at https://huggingface.co/mradermacher/Diver-Retriever-0.6B-GGUF. | **Model** | **#Total Params** | **Context Length** | **Download** | **BRIGHT** | | :------------------: | :---------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------: | | DIVER-Retriever-4B | 4B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B | **28.9** | | DIVER-Retriever-1.7B | 1.7B | 40K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-1.7B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-1.7B | **27.3** | | DIVER-Retriever-0.6B | 0.6B | 32K | [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-0.6B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-0.6B | **25.2** | ### Model Description - **Model type:** Text Embedding - **Language(s) (NLP):** Bilingual (Chinese & English) - **Context Length:** 32k - **Number of Paramaters:** 0.6B For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our GitHub (https://github.com/AQ-MedAI/Diver). ## Evaluation ### Evaluation of Bright Benchmark

Method	Avg.	Bio.	Earth.	Econ.	Psy.	Rob.	Stack.	Sus.	Leet.	Pony	AoPS	TheoQ.	TheoT.
Evaluate Retriever with Original Query
BM25	14.5	18.9	27.2	14.9	12.5	13.6	18.4	15.0	24.4	7.9	6.2	10.4	4.9
SBERT	14.9	15.1	20.4	16.6	22.7	8.2	11.0	15.3	26.4	7.0	5.3	20.0	10.8
gte-Qwen1.5-7B	22.5	30.6	36.4	17.8	24.6	13.2	22.2	14.8	25.5	9.9	14.4	27.8	32.9
Qwen3-4B	5.6	3.5	8.0	2.3	2.0	1.6	1.0	4.4	2.1	0.1	4.9	18.0	19.2
OpenAI	17.9	23.3	26.7	19.5	27.6	12.8	14.3	20.5	23.6	2.4	8.5	23.5	11.7
Google	20.0	22.7	34.8	19.6	27.8	15.7	20.1	17.1	29.6	3.6	9.3	23.8	15.9
ReasonIR-8B	24.4	26.2	31.4	23.3	30.0	18.0	23.9	20.5	35.0	10.5	14.7	31.9	27.2
RaDeR-7B	25.5	34.6	38.9	22.1	33.0	14.8	22.5	23.7	37.3	5.0	10.2	28.4	35.1
Seed1.5-Embedding	27.2	34.8	46.9	23.4	31.6	19.1	25.4	21.0	43.2	4.9	12.2	33.3	30.5
DIVER-Retriever-0.6B	25.2	36.4	41.9	29.0	31.0	21.2	24.6	23.2	15.6	6.8	8.4	33.2	31.7
DIVER-Retriever-4B	28.9	41.8	43.7	21.7	35.3	21.0	21.2	25.1	37.6	13.2	10.7	38.4	37.3
Evaluate Retriever with GPT-4 REASON-query
BM25	27.0	53.6	54.1	24.3	38.7	18.9	27.7	26.3	19.3	17.6	3.9	19.2	20.8
SBERT	17.8	18.5	26.3	17.5	27.2	8.8	11.8	17.5	24.3	10.3	5.0	22.3	23.5
gte-Qwen1.5-7B	24.8	35.5	43.1	24.3	34.3	15.4	22.9	23.9	25.4	5.2	4.6	28.7	34.6
Qwen3-4B	5.5	1.3	17.3	2.5	6.2	1.0	4.8	4.5	3.0	5.9	0.0	7.2	12.5
OpenAI	23.3	35.2	40.1	25.1	38.0	13.6	18.2	24.2	24.5	6.5	7.7	22.9	23.8
Google	26.2	36.4	45.6	25.6	38.2	18.7	29.5	17.9	31.1	3.7	10.0	27.8	30.4
ReasonIR-8B	29.9	43.6	42.9	32.7	38.8	20.9	25.8	27.5	31.5	19.6	7.4	33.1	35.7
RaDeR-7B	29.2	36.1	42.9	25.2	37.9	16.6	27.4	25.0	34.8	11.9	12.0	37.7	43.4
DIVER-Retriever-4B	32.1	51.9	53.5	29.5	41.2	21.4	27.5	26.1	33.5	11.7	9.5	39.3	39.7
Evaluate retriever with DIVER-QExpand query
ReasonIR-8B	32.6	49.4	44.7	32.4	44.0	26.6	31.8	29.0	32.3	12.8	9.1	40.7	38.4
+BM25 (Hybrid)	35.7	56.8	53.5	33.0	48.5	29.4	34.2	32.0	35.2	16.8	12.9	39.3	36.8
DIVER-Retriever-4B	33.9	54.5	52.7	28.8	44.9	25.1	27.4	29.5	34.5	10.0	14.5	40.7	44.7
+BM25 (Hybrid)	37.2	60.0	55.9	31.8	47.9	27.1	33.9	31.9	35.1	23.1	16.8	36.9	46.6

## Usage ### Inference #### Sentence Transformers Usage ```bash # Requires transformers>=4.51.0 # Requires sentence-transformers>=2.7.0 from sentence_transformers import SentenceTransformer # Load the model model = SentenceTransformer("AQ-MedAI/Diver-Retriever-0.6B") # The queries and documents to embed queries = [ "What is the capital of China?", "Explain gravity", ] documents = [ "The capital of China is Beijing.", "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.", ] # Encode the queries and documents. Note that queries benefit from using a prompt # Here we use the prompt called "query" stored under `model.prompts`, but you can # also pass your own prompt via the `prompt` argument query_embeddings = model.encode(queries, prompt_name="query") document_embeddings = model.encode(documents) # Compute the (cosine) similarity between the query and document embeddings similarity = model.similarity(query_embeddings, document_embeddings) print(similarity) ``` #### Transformers Usage ```bash # Requires transformers>=4.51.0 import torch import torch.nn.functional as F from torch import Tensor from transformers import AutoTokenizer, AutoModel def last_token_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor: left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0]) if left_padding: return last_hidden_states[:, -1] else: sequence_lengths = attention_mask.sum(dim=1) - 1 batch_size = last_hidden_states.shape[0] return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths] def get_detailed_instruct(task_description: str, query: str) -> str: return f'Instruct: {task_description}\nQuery:{query}' # Each query must come with a one-sentence instruction that describes the task task = 'Given a web search query, retrieve relevant passages that answer the query' queries = [ get_detailed_instruct(task, 'What is the capital of China?'), get_detailed_instruct(task, 'Explain gravity') ] # No need to add instruction for retrieval documents documents = [ "The capital of China is Beijing.", "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun." ] input_texts = queries + documents tokenizer = AutoTokenizer.from_pretrained('AQ-MedAI/Diver-Retriever-0.6B', padding_side='left') model = AutoModel.from_pretrained('AQ-MedAI/Diver-Retriever-0.6B') max_length = 8192 # Tokenize the input texts batch_dict = tokenizer( input_texts, padding=True, truncation=True, max_length=max_length, return_tensors="pt", ) batch_dict.to(model.device) outputs = model(**batch_dict) embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask']) # normalize embeddings embeddings = F.normalize(embeddings, p=2, dim=1) scores = (embeddings[:2] @ embeddings[2:].T) print(scores.tolist()) # [[0.7534257769584656, 0.1146894246339798], [0.03198453038930893, 0.6258305311203003]] ``` ### Finetuning We recommend you to use [swift](https://github.com/modelscope/ms-swift) to finetune our DIVER-Retriever-0.6B with infonce. Before starting training, please ensure your environment is properly configured. ```bash pip install ms-swift -U # Install from source pip install git+https://github.com/modelscope/ms-swift.git pip install transformers -U # Optional packages pip install deepspeed # multi-GPU training pip install liger-kernel # save GPU memory resources pip install flash-attn --no-build-isolation ``` #### Training Command Using infonce loss as an example, the complete training command is as follows: ```bash nproc_per_node=8 NPROC_PER_NODE=$nproc_per_node \ swift sft \ --model AQ-MedAI/Diver-Retriever-0.6B \ --task_type embedding \ --model_type qwen3_emb \ --train_type full \ --dataset your_dataset \ --split_dataset_ratio 0.05 \ --eval_strategy steps \ --output_dir output \ --eval_steps 20 \ --num_train_epochs 5 \ --save_steps 20 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 4 \ --learning_rate 6e-6 \ --loss_type infonce \ --label_names labels \ --dataloader_drop_last true \ --deepspeed zero3 ``` ## Citation If you find our work helpful, feel free to cite it. ``` @misc{long2025divermultistageapproachreasoningintensive, title={DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval}, author={Meixiu Long and Duolin Sun and Dan Yang and Junjie Wang and Yue Shen and Jian Wang and Peng Wei and Jinjie Gu and Jiahai Wang}, year={2025}, eprint={2508.07995}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2508.07995}, } ```