LRAT-Qwen3-Embedding-0.6B
LRAT-Qwen3-Embedding-0.6B is a dense retriever obtained by fine-tuning Qwen/Qwen3-Embedding-0.6B with LRAT (Learning to Retrieve from Agent Trajectories), a trajectory-driven training framework for agentic search.
Instead of relying on human click logs, LRAT learns retrieval supervision from multi-step search agent interactions. The model is trained on query-document pairs mined from deep research trajectories, where positive and negative signals come from agent search/browse behavior and post-browse reasoning traces.
What This Model Is For
This checkpoint is intended for:
- dense retrieval in agentic search systems
- retrieval-augmented deep research pipelines
- multi-step information seeking and evidence discovery
- replacing a general-purpose embedding retriever with a retriever aligned to search agents
It is not primarily optimized for:
- generic semantic similarity benchmarks
- classification
- clustering without retrieval-oriented tuning
Model Details
- Base model:
Qwen/Qwen3-Embedding-0.6B - Training framework: LRAT
- Training objective: weighted contrastive learning with trajectory-derived positives and negatives
- Retrieval setting: dense bi-encoder retrieval
- Intended domain: long-horizon information seeking for search agents
How LRAT Builds Supervision
LRAT mines supervision directly from agent trajectories:
- Browsed documents after a search step are treated as initial positive candidates.
- Unbrowsed documents from the same retrieved set are treated as reliable negatives.
- A reasoning-aware LLM judge filters noisy positives using the agent's post-browse reasoning trace.
- The length of the post-browse reasoning is converted into a relevance intensity weight.
- The retriever is optimized with a weighted InfoNCE-style contrastive loss.
This design is motivated by the observation that search agents behave differently from human users: they exhibit weak position bias, explicitly browse useful documents, and produce reasoning traces that reveal document utility.
Training Data
This model is trained from LRAT trajectory-derived supervision constructed from:
- 10K seed queries from InfoSeekQA
- Tongyi-DeepResearch-30B-A3B as the trajectory collection agent
- Wiki-25-Dump as the retrieval corpus
- four retrievers used during trajectory collection: BM25, Qwen3-Embedding-0.6B, Qwen3-Embedding-4B, and Qwen3-Embedding-8B
In the paper instantiation, this process yields:
- 26,482 valid agent trajectories
- 91,713 training pairs
For more details, please refer to the Training Data.
Training Recipe
The retriever is fine-tuned with the following setup reported in the paper:
- epochs: 2
- batch size: 32
- learning rate:
1e-6 - maximum input length: 512
- group size for contrastive training: 10
- temperature: 0.02
- training framework: FlagEmbedding-based dense retriever fine-tuning
Evaluation Summary
The paper evaluates LRAT-trained retrievers inside six search agents on:
- InfoSeek-Eval for in-domain evaluation
- BrowseComp-Plus for out-of-domain evaluation
For the Qwen3-Embedding-0.6B backbone, LRAT consistently improves success rate, evidence recall, and execution efficiency.
Representative Results for This Backbone
| Agent Backbone | InfoSeek-Eval SR | BrowseComp-Plus SR | BrowseComp-Plus Recall |
|---|---|---|---|
| AgentCPM-Explore (base) | 40.3 | 13.5 | 23.2 |
| AgentCPM-Explore (+ LRAT) | 55.7 | 15.8 | 32.0 |
| WebExplore (base) | 52.0 | 21.0 | 47.7 |
| WebExplore (+ LRAT) | 68.7 | 27.2 | 55.9 |
| Tongyi-DeepResearch (base) | 52.7 | 17.8 | 49.2 |
| Tongyi-DeepResearch (+ LRAT) | 68.0 | 23.7 | 60.7 |
| GPT-OSS (120B, base) | 40.0 | 9.0 | 43.7 |
| GPT-OSS (120B, + LRAT) | 47.0 | 12.1 | 56.4 |
| MiniMax-M2.1 (base) | 58.7 | 38.2 | 57.2 |
| MiniMax-M2.1 (+ LRAT) | 78.3 | 48.3 | 69.2 |
| GLM-4.7 (base) | 67.7 | 43.9 | 66.6 |
| GLM-4.7 (+ LRAT) | 82.0 | 54.6 | 77.8 |
Across these agents, LRAT improves:
- InfoSeek-Eval success rate by roughly
+17.5%to+38.2%relative - BrowseComp-Plus success rate by roughly
+17.0%to+34.4%relative - BrowseComp-Plus evidence recall by roughly
+16.8%to+37.9%relative
Usage
This checkpoint should be used with the same tokenizer, pooling strategy, and normalization settings as the upstream base model and the LRAT retrieval codebase.
from transformers import AutoTokenizer, AutoModel
model_id = "Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
# Use the same pooling / normalization recipe as the original
# Qwen3-Embedding-0.6B inference setup in your retrieval pipeline.
If you are using the official LRAT codebase, this model is intended to be plugged into the dense retrieval workflow used for indexing and agent-time retrieval.
License and Release Notes
Please verify license compatibility with:
- the upstream base model
- the released training data
- the source corpora and benchmarks used in your downstream setup
If desired, this section can be updated later with the final project-specific license statement.
Citation
If you use this checkpoint, please cite the LRAT paper.
@inproceedings{zhou2026lrat,
title={Learning to Retrieve from Agent Trajectories},
author={Zhou, Yuqi and Dai, Sunhao and Qu, Changle and Pang, Liang and Xu, Jun and Wen, Ji-Rong},
booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2026}
}
Links
- Paper:
https://arxiv.org/abs/2604.04949 - Project page:
https://yuqi-zhou.github.io/LRAT-homepage/ - Code:
https://github.com/Yuqi-Zhou/LRAT - Model:
https://huggingface.co/Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B - Dataset:
https://huggingface.co/datasets/Yuqi-Zhou/LRAT-Train - Companion E5 checkpoint:
https://huggingface.co/Yuqi-Zhou/LRAT-multilingual-e5-large
- Downloads last month
- 42