LRAT-Qwen3-Embedding-0.6B

LRAT-Qwen3-Embedding-0.6B is a dense retriever obtained by fine-tuning Qwen/Qwen3-Embedding-0.6B with LRAT (Learning to Retrieve from Agent Trajectories), a trajectory-driven training framework for agentic search.

Instead of relying on human click logs, LRAT learns retrieval supervision from multi-step search agent interactions. The model is trained on query-document pairs mined from deep research trajectories, where positive and negative signals come from agent search/browse behavior and post-browse reasoning traces.

What This Model Is For

This checkpoint is intended for:

  • dense retrieval in agentic search systems
  • retrieval-augmented deep research pipelines
  • multi-step information seeking and evidence discovery
  • replacing a general-purpose embedding retriever with a retriever aligned to search agents

It is not primarily optimized for:

  • generic semantic similarity benchmarks
  • classification
  • clustering without retrieval-oriented tuning

Model Details

  • Base model: Qwen/Qwen3-Embedding-0.6B
  • Training framework: LRAT
  • Training objective: weighted contrastive learning with trajectory-derived positives and negatives
  • Retrieval setting: dense bi-encoder retrieval
  • Intended domain: long-horizon information seeking for search agents

How LRAT Builds Supervision

LRAT mines supervision directly from agent trajectories:

  1. Browsed documents after a search step are treated as initial positive candidates.
  2. Unbrowsed documents from the same retrieved set are treated as reliable negatives.
  3. A reasoning-aware LLM judge filters noisy positives using the agent's post-browse reasoning trace.
  4. The length of the post-browse reasoning is converted into a relevance intensity weight.
  5. The retriever is optimized with a weighted InfoNCE-style contrastive loss.

This design is motivated by the observation that search agents behave differently from human users: they exhibit weak position bias, explicitly browse useful documents, and produce reasoning traces that reveal document utility.

Training Data

This model is trained from LRAT trajectory-derived supervision constructed from:

  • 10K seed queries from InfoSeekQA
  • Tongyi-DeepResearch-30B-A3B as the trajectory collection agent
  • Wiki-25-Dump as the retrieval corpus
  • four retrievers used during trajectory collection: BM25, Qwen3-Embedding-0.6B, Qwen3-Embedding-4B, and Qwen3-Embedding-8B

In the paper instantiation, this process yields:

  • 26,482 valid agent trajectories
  • 91,713 training pairs

For more details, please refer to the Training Data.

Training Recipe

The retriever is fine-tuned with the following setup reported in the paper:

  • epochs: 2
  • batch size: 32
  • learning rate: 1e-6
  • maximum input length: 512
  • group size for contrastive training: 10
  • temperature: 0.02
  • training framework: FlagEmbedding-based dense retriever fine-tuning

Evaluation Summary

The paper evaluates LRAT-trained retrievers inside six search agents on:

  • InfoSeek-Eval for in-domain evaluation
  • BrowseComp-Plus for out-of-domain evaluation

For the Qwen3-Embedding-0.6B backbone, LRAT consistently improves success rate, evidence recall, and execution efficiency.

Representative Results for This Backbone

Agent Backbone InfoSeek-Eval SR BrowseComp-Plus SR BrowseComp-Plus Recall
AgentCPM-Explore (base) 40.3 13.5 23.2
AgentCPM-Explore (+ LRAT) 55.7 15.8 32.0
WebExplore (base) 52.0 21.0 47.7
WebExplore (+ LRAT) 68.7 27.2 55.9
Tongyi-DeepResearch (base) 52.7 17.8 49.2
Tongyi-DeepResearch (+ LRAT) 68.0 23.7 60.7
GPT-OSS (120B, base) 40.0 9.0 43.7
GPT-OSS (120B, + LRAT) 47.0 12.1 56.4
MiniMax-M2.1 (base) 58.7 38.2 57.2
MiniMax-M2.1 (+ LRAT) 78.3 48.3 69.2
GLM-4.7 (base) 67.7 43.9 66.6
GLM-4.7 (+ LRAT) 82.0 54.6 77.8

Across these agents, LRAT improves:

  • InfoSeek-Eval success rate by roughly +17.5% to +38.2% relative
  • BrowseComp-Plus success rate by roughly +17.0% to +34.4% relative
  • BrowseComp-Plus evidence recall by roughly +16.8% to +37.9% relative

Usage

This checkpoint should be used with the same tokenizer, pooling strategy, and normalization settings as the upstream base model and the LRAT retrieval codebase.

from transformers import AutoTokenizer, AutoModel

model_id = "Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)

# Use the same pooling / normalization recipe as the original
# Qwen3-Embedding-0.6B inference setup in your retrieval pipeline.

If you are using the official LRAT codebase, this model is intended to be plugged into the dense retrieval workflow used for indexing and agent-time retrieval.

License and Release Notes

Please verify license compatibility with:

  • the upstream base model
  • the released training data
  • the source corpora and benchmarks used in your downstream setup

If desired, this section can be updated later with the final project-specific license statement.

Citation

If you use this checkpoint, please cite the LRAT paper.

@inproceedings{zhou2026lrat,
  title={Learning to Retrieve from Agent Trajectories},
  author={Zhou, Yuqi and Dai, Sunhao and Qu, Changle and Pang, Liang and Xu, Jun and Wen, Ji-Rong},
  booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2026}
}

Links

  • Paper: https://arxiv.org/abs/2604.04949
  • Project page: https://yuqi-zhou.github.io/LRAT-homepage/
  • Code: https://github.com/Yuqi-Zhou/LRAT
  • Model: https://huggingface.co/Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B
  • Dataset: https://huggingface.co/datasets/Yuqi-Zhou/LRAT-Train
  • Companion E5 checkpoint: https://huggingface.co/Yuqi-Zhou/LRAT-multilingual-e5-large
Downloads last month
42
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B

Finetuned
(159)
this model

Collection including Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B

Paper for Yuqi-Zhou/LRAT-Qwen3-Embedding-0.6B