Update README.md
Browse files
README.md
CHANGED
|
@@ -21,6 +21,8 @@ tags:
|
|
| 21 |
|
| 22 |
**INF-Query-Aligner** is a specialized component of the **INF-X-Retriever** framework, designed to distill the core retrieval intent from complex, verbose, or reasoning-intensive queries. Built upon the **Qwen2.5-7B-instruct** foundation and fine-tuned via Reinforcement Learning, it transforms raw user queries into concise, search-optimized queries for dense retrieval systems.
|
| 23 |
|
|
|
|
|
|
|
| 24 |
This model is a key enabler for **INF-X-Retriever**'s state-of-the-art performance, currently holding the **No. 1 position** on the [BRIGHT Benchmark](https://brightbenchmark.github.io/) (as of Dec 17, 2025).
|
| 25 |
|
| 26 |
For more details on the full framework, please visit the [INF-X-Retriever Repository](https://github.com/yaoyichen/INF-X-Retriever).
|
|
@@ -48,7 +50,7 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 48 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 49 |
|
| 50 |
# Define input query
|
| 51 |
-
|
| 52 |
|
| 53 |
QUERY_WRITER_PROMPT = (
|
| 54 |
"For the input query, formulating a concise search query for dense retrieval by distilling the core intent from a complex user prompt and ignoring LLM instructions."
|
|
@@ -63,7 +65,7 @@ messages = [
|
|
| 63 |
"role": "user",
|
| 64 |
"content": (
|
| 65 |
f"{QUERY_WRITER_PROMPT}\n\n"
|
| 66 |
-
f"**Input Query:**\n{
|
| 67 |
f"**Your Output:**\n"
|
| 68 |
),
|
| 69 |
},
|
|
@@ -116,5 +118,5 @@ If you find this model useful, please consider citing our work:
|
|
| 116 |
|
| 117 |
## 📬 Contact
|
| 118 |
|
| 119 |
-
|
| 120 |
|
|
|
|
| 21 |
|
| 22 |
**INF-Query-Aligner** is a specialized component of the **INF-X-Retriever** framework, designed to distill the core retrieval intent from complex, verbose, or reasoning-intensive queries. Built upon the **Qwen2.5-7B-instruct** foundation and fine-tuned via Reinforcement Learning, it transforms raw user queries into concise, search-optimized queries for dense retrieval systems.
|
| 23 |
|
| 24 |
+
In our experiments, a single canonical query-writing prompt was applied across all datasets to ensure consistency and reproducibility.
|
| 25 |
+
|
| 26 |
This model is a key enabler for **INF-X-Retriever**'s state-of-the-art performance, currently holding the **No. 1 position** on the [BRIGHT Benchmark](https://brightbenchmark.github.io/) (as of Dec 17, 2025).
|
| 27 |
|
| 28 |
For more details on the full framework, please visit the [INF-X-Retriever Repository](https://github.com/yaoyichen/INF-X-Retriever).
|
|
|
|
| 50 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 51 |
|
| 52 |
# Define input query
|
| 53 |
+
query = "Claim in article about why insects are attracted to light\nIn this article they are addressing the reason insects are attracted to light when they say\nHeat radiation as an attractive component is refuted by the effect of LED lighting, which supplies negligible infrared radiation yet still entraps vast numbers of insects.\nI don't see why attraction to LEDs shows they're not seeking heat. Could they for example be evolutionarily programmed to associate light with heat? So that even though they don't encounter heat near/on the LEDs they still \"expect\" to?"
|
| 54 |
|
| 55 |
QUERY_WRITER_PROMPT = (
|
| 56 |
"For the input query, formulating a concise search query for dense retrieval by distilling the core intent from a complex user prompt and ignoring LLM instructions."
|
|
|
|
| 65 |
"role": "user",
|
| 66 |
"content": (
|
| 67 |
f"{QUERY_WRITER_PROMPT}\n\n"
|
| 68 |
+
f"**Input Query:**\n{query}\n"
|
| 69 |
f"**Your Output:**\n"
|
| 70 |
),
|
| 71 |
},
|
|
|
|
| 118 |
|
| 119 |
## 📬 Contact
|
| 120 |
|
| 121 |
+
Yichen Yao ([eason.yyc@inftech.ai](mailto:eason.yyc@inftech.ai))
|
| 122 |
|