LRAT-Qwen3-Embedding-0.6B

LRAT-Qwen3-Embedding-0.6B is a dense retriever obtained by fine-tuning Qwen/Qwen3-Embedding-0.6B with LRAT (Learning to Retrieve from Agent Trajectories), a trajectory-driven training framework for agentic search.

Instead of relying on human click logs, LRAT learns retrieval supervision from multi-step search agent interactions. The model is trained on query-document pairs mined from deep research trajectories, where positive and negative signals come from agent search/browse behavior and post-browse reasoning traces.

What This Model Is For

This checkpoint is intended for:

dense retrieval in agentic search systems
retrieval-augmented deep research pipelines
multi-step information seeking and evidence discovery
replacing a general-purpose embedding retriever with a retriever aligned to search agents

It is not primarily optimized for:

generic semantic similarity benchmarks
classification
clustering without retrieval-oriented tuning

Model Details

Base model: Qwen/Qwen3-Embedding-0.6B
Training framework: LRAT
Training objective: weighted contrastive learning with trajectory-derived positives and negatives
Retrieval setting: dense bi-encoder retrieval
Intended domain: long-horizon information seeking for search agents

How LRAT Builds Supervision

LRAT mines supervision directly from agent trajectories:

Browsed documents after a search step are treated as initial positive candidates.
Unbrowsed documents from the same retrieved set are treated as reliable negatives.
A reasoning-aware LLM judge filters noisy positives using the agent's post-browse reasoning trace.
The length of the post-browse reasoning is converted into a relevance intensity weight.
The retriever is optimized with a weighted InfoNCE-style contrastive loss.

This design is motivated by the observation that search agents behave differently from human users: they exhibit weak position bias, explicitly browse useful documents, and produce reasoning traces that reveal document utility.

Training Data

This model is trained from LRAT trajectory-derived supervision constructed from:

10K seed queries from InfoSeekQA
Tongyi-DeepResearch-30B-A3B as the trajectory collection agent
Wiki-25-Dump as the retrieval corpus
four retrievers used during trajectory collection: BM25, Qwen3-Embedding-0.6B, Qwen3-Embedding-4B, and Qwen3-Embedding-8B

In the paper instantiation, this process yields:

26,482 valid agent trajectories
91,713 training pairs

For more details, please refer to the Training Data.

Training Recipe

The retriever is fine-tuned with the following setup reported in the paper:

epochs: 2
batch size: 32
learning rate: 1e-6
maximum input length: 512
group size for contrastive training: 10
temperature: 0.02
training framework: FlagEmbedding-based dense retriever fine-tuning

Evaluation Summary

The paper evaluates LRAT-trained retrievers inside six search agents on:

InfoSeek-Eval for in-domain evaluation
BrowseComp-Plus for out-of-domain evaluation

For the Qwen3-Embedding-0.6B backbone, LRAT consistently improves success rate, evidence recall, and execution efficiency.

Representative Results for This Backbone

Agent Backbone	InfoSeek-Eval SR	BrowseComp-Plus SR	BrowseComp-Plus Recall
AgentCPM-Explore (base)	40.3	13.5	23.2
AgentCPM-Explore (+ LRAT)	55.7	15.8	32.0
WebExplore (base)	52.0	21.0	47.7
WebExplore (+ LRAT)	68.7	27.2	55.9
Tongyi-DeepResearch (base)	52.7	17.8	49.2
Tongyi-DeepResearch (+ LRAT)	68.0	23.7	60.7
GPT-OSS (120B, base)	40.0	9.0	43.7
GPT-OSS (120B, + LRAT)	47.0	12.1	56.4
MiniMax-M2.1 (base)	58.7	38.2	57.2
MiniMax-M2.1 (+ LRAT)	78.3	48.3	69.2
GLM-4.7 (base)	67.7	43.9	66.6
GLM-4.7 (+ LRAT)	82.0	54.6	77.8

Across these agents, LRAT improves:

InfoSeek-Eval success rate by roughly +17.5% to +38.2% relative
BrowseComp-Plus success rate by roughly +17.0% to +34.4% relative
BrowseComp-Plus evidence recall by roughly +16.8% to +37.9% relative

Usage

This checkpoint should be used with the same tokenizer, pooling strategy, and normalization settings as the upstream base model and the LRAT retrieval codebase.

from transformers import AutoTokenizer, AutoModel

model_id = "Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)

# Use the same pooling / normalization recipe as the original
# Qwen3-Embedding-0.6B inference setup in your retrieval pipeline.

If you are using the official LRAT codebase, this model is intended to be plugged into the dense retrieval workflow used for indexing and agent-time retrieval.

License and Release Notes

Please verify license compatibility with:

the upstream base model
the released training data
the source corpora and benchmarks used in your downstream setup

If desired, this section can be updated later with the final project-specific license statement.

Citation

If you use this checkpoint, please cite the LRAT paper.

@inproceedings{zhou2026lrat,
  title={Learning to Retrieve from Agent Trajectories},
  author={Zhou, Yuqi and Dai, Sunhao and Qu, Changle and Pang, Liang and Xu, Jun and Wen, Ji-Rong},
  booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2026}
}

Model tree for Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Finetuned

(159)

this model

Collection including Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B

LRAT

Collection

Official resources for LRAT, including trajectory-trained dense retrievers and the LRAT training dataset for agentic search. • 4 items • Updated 6 days ago • 4

Paper for Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B

Learning to Retrieve from Agent Trajectories

Paper • 2604.04949 • Published 15 days ago • 68

Yuqi-Zhou
/

LRAT-Qwen3-Embedding-0.6B