MRE-T1 / README.md

Update README.md

b54af00 verified 25 days ago

8.41 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- retrieval
	- reasoning
	- embedding
	- BRIGHT
	- information-retrieval
	library_name: transformers
	pipeline_tag: feature-extraction
	base_model: Qwen/Qwen3-4B-Instruct-2507
	datasets:
	- xlangai/BRIGHT
	model-index:
	- name: MRE-T1
	results:
	- task:
	type: Retrieval
	dataset:
	type: xlangai/BRIGHT
	name: BRIGHT (Short)
	metrics:
	- type: ndcg_at_10
	value: 39.6
	name: nDCG@10
	- task:
	type: Retrieval
	dataset:
	type: xlangai/BRIGHT
	name: BRIGHT (Long)
	metrics:
	- type: ndcg_at_10
	value: 35.1
	name: nDCG@10
	---

	<div align="center">
	<h3>Built by <a href="https://huggingface.co/ForwardAILabs">Forward AI Labs</a></h3>
	<p>We are an AI company that provides recruitment agents. \| <a href="https://www.mira.day/">mira.day</a></p>
	</div>

	---

	# MRE-T1: Mira Reasoning Embedding — Thought v1

	MRE-T1 (Mira Reasoning Embedding, Thought v1) is the first generation of our reasoning-intensive retrieval model series. The "Thought" in T1 reflects the model's core capability — it thinks before it retrieves, generating explicit reasoning chains to deeply understand query intent before producing embeddings.

	MRE-T1 achieves state-of-the-art single-model performance on the [BRIGHT benchmark](https://brightbenchmark.github.io/), which evaluates retrieval models on tasks requiring complex reasoning capabilities.

	## Highlights

	- BRIGHT Short nDCG@10: 39.6 — achieves the best single-model result on the short document retrieval leaderboard
	- BRIGHT Long nDCG@10: 35.1 — achieves the best single-model result on the long document retrieval leaderboard
	- Efficient: Based on Qwen3-4B architecture, significantly smaller than many competing 7-8B models
	- Reasoning-aware: Uses task-specific reasoning prompts with a special `<emb_token>` for embedding extraction

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| Qwen3ForCausalLM \|
	\| Parameters \| ~4B \|
	\| Hidden Size \| 2560 \|
	\| Layers \| 36 \|
	\| Attention Heads \| 32 (KV heads: 8) \|
	\| Max Position \| 262,144 \|
	\| Precision \| bfloat16 \|
	\| Vocabulary \| 151,670 \|

	## BRIGHT Benchmark Results

	### Short Document Retrieval (nDCG@10)

	\| Task \| MRE-T1 \|
	\|------\|--------\|
	\| Biology \| 55.3 \|
	\| Earth Science \| 56.5 \|
	\| Economics \| 32.9 \|
	\| Psychology \| 48.2 \|
	\| Robotics \| 33.1 \|
	\| StackOverflow \| 34.2 \|
	\| Sustainable Living \| 37.3 \|
	\| LeetCode \| 35.0 \|
	\| Pony \| 35.5 \|
	\| AOPS \| 16.7 \|
	\| TheoremQA (Questions) \| 43.3 \|
	\| TheoremQA (Theorems) \| 46.9 \|
	\| Average \| 39.6 \|

	### Long Document Retrieval (nDCG@10)

	\| Task \| MRE-T1 \|
	\|------\|--------\|
	\| Biology \| 46.5 \|
	\| Earth Science \| 46 \|
	\| Economics \| 34.5 \|
	\| Psychology \| 52.7 \|
	\| Robotics \| 27.7 \|
	\| StackOverflow \| 22.2 \|
	\| Sustainable Living \| 45.2 \|
	\| Pony \| 6.3 \|
	\| Average \| 35.1 \|

	### Comparison with Other Models (Short, Single Model Only)

	\| Model \| Size \| BRIGHT Short nDCG@10 \|
	\|-------\|------\|---------------------\|
	\| MRE-T1 \| ~4B \| 39.6 \|
	\| BGE-Reasoner-Embed-0928 \| 8B \| 38.1 \|
	\| Seed1.5-Embedding \| MoE \| 27.2 \|
	\| gte-Qwen1.5-7B-instruct \| 7B \| 22.5 \|
	\| GritLM-7B \| 7B \| 21.0 \|
	\| instructor-xl \| 1.5B \| 18.9 \|
	\| SFR-Embedding-Mistral \| 7B \| 18.3 \|
	\| e5-mistral-7b-instruct \| 7B \| 17.9 \|

	### Comparison with Other Models (Long, Single Model Only)

	\| Model \| Size \| BRIGHT Long nDCG@10 \|
	\|-------\|------\|---------------------\|
	\| MRE-T1 \| ~4B \| 35.1 \|
	\| Google-Gecko-Text-Embedding-004 \| — \| 33.2 \|
	\| gte-Qwen1.5-7B-instruct \| 7B \| 27.8 \|
	\| SFR-Embedding-Mistral \| 7B \| 26.0 \|
	\| e5-mistral-7b-instruct \| 7B \| 25.5 \|
	\| voyage-large-2-instruct \| — \| 24.6 \|
	\| Cohere-embed-english-v3.0 \| — \| 18.4 \|
	\| bge-large-en-v1.5 \| 335M \| 14.8 \|

	## Usage

	MRE-T1 uses task-specific system prompts for reasoning-enhanced retrieval. Each query is processed with a domain-specific instruction, and the model generates a reasoning chain followed by a special `<emb_token>` whose representation is used as the query embedding.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "ForwardAILabs/MRE-T1"
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")

	# Task-specific system prompts
	TASK_PROMPTS = {
	"biology": "Given a Biology post, extract and briefly describe the core underlying principle or mechanism of this biology question. You MUST end every response with <emb_token>.",
	"earth_science": "Given an Earth Science post, identify the type of Earth science question and briefly describe the core principle for solving it. You MUST end every response with <emb_token>.",
	"economics": "Given an Economics post, analyze the user's core blind spot and the applicable economic analysis framework. You MUST end every response with <emb_token>.",
	"psychology": "Given a Psychology post, extract the user's blind spot and key psychological concepts. You MUST end every response with <emb_token>.",
	"robotics": "Given a Robotics post, diagnose the core issue within the robotics environment and error logs, and point out the applicable technical principles. You MUST end every response with <emb_token>.",
	"stackoverflow": "Given a Stack Overflow post, extract the core underlying technical principle for solving the code error. You MUST end every response with <emb_token>.",
	"sustainable_living": "Given a Sustainable Living post, identify the key scientific concepts and background knowledge required for a closed-loop solution to the life phenomenon or practice. You MUST end every response with <emb_token>.",
	"leetcode": "Given a Coding problem, extract the core algorithm principle (or data structure) and general problem-solving approach. You MUST end every response with <emb_token>.",
	"pony": "Given a Pony question, locate the core knowledge points from the Pony language official documentation needed to solve the code completion problem. You MUST end every response with <emb_token>.",
	"aops": "Given a Math problem, analyze the problem type characteristics and core examination principles of this math competition problem. You MUST end every response with <emb_token>.",
	"theoremqa_questions": "Given a Math problem, analyze the problem type characteristics and core examination principles of this math competition problem. You MUST end every response with <emb_token>.",
	"theoremqa_theorems": "Given a Math problem, distill the core mathematical principles and problem-solving techniques required for the real-world scenario. You MUST end every response with <emb_token>.",
	}

	# Example: Generate reasoning-enhanced query embedding
	task = "stackoverflow"
	query = "How to fix a segmentation fault when using shared_ptr in a multithreaded C++ application?"

	messages = [
	{"role": "system", "content": TASK_PROMPTS[task]},
	{"role": "user", "content": query}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model(**inputs, output_hidden_states=True)
	# Use the last hidden state at the <emb_token> position as the embedding
	embedding = outputs.hidden_states[-1][0, -1, :]

	print(f"Embedding shape: {embedding.shape}")
	```

	## Training

	MRE-T1 is trained using a two-stage approach on the Qwen3-4B base model:
	1. Stage 1: Supervised fine-tuning with task-specific reasoning prompts
	2. Stage 2: Reinforcement learning to optimize retrieval quality

	Training data is curated from diverse reasoning-intensive domains including mathematics, science, programming, and social sciences.

	## Evaluation

	Evaluated on [BRIGHT](https://brightbenchmark.github.io/) (Bridging Reasoning and Information Gathering with Holistic Thinking), a benchmark specifically designed to test retrieval models on tasks requiring complex reasoning.

	## Citation

	If you use MRE-T1 in your research, please cite:

	```bibtex
	@misc{mre-t1-2026,
	title={MRE-T1: Reasoning-Enhanced Retrieval Model},
	author={Forward AI},
	year={2026},
	url={https://huggingface.co/ForwardAILabs/MRE-T1}
	}
	```

	## License

	Apache 2.0

	---

	Built by [Forward AI Labs](https://huggingface.co/ForwardAILabs) \| [mira.day](https://www.mira.day/)