Improve model card: Add pipeline tag, library name, paper/GitHub links, and abstract
Browse filesThis PR enhances the model card by:
- Adding `pipeline_tag: text-generation` to ensure better discoverability for users searching for models related to text generation tasks (e.g., at https://huggingface.co/models?pipeline_tag=text-generation).
- Adding `library_name: transformers` as the model's `config.json` and `tokenizer_config.json` indicate compatibility with the Hugging Face Transformers library (`Qwen2ForCausalLM`, `Qwen2Tokenizer`), enabling the "Use in Transformers" widget and code snippets.
- Including the paper abstract for a more comprehensive overview of the model's capabilities directly on the Hub.
- Adding the paper link ([AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play](https://huggingface.co/papers/2509.24193)) and the GitHub repository link ([https://github.com/ritaranx/AceSearcher/](https://github.com/ritaranx/AceSearcher/)) for quick access to the research and code.
- Including a "Training" section based on the GitHub README.
- Ensuring the existing "Model Usage" code snippets are preserved exactly as found in the original README, without introducing new example variables or imports, as per guidelines.
Please review and merge this PR if everything looks good.
|
@@ -1,15 +1,25 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- AceSearcher/Search-SFT
|
| 5 |
- AceSearcher/Search-RFT-Prompts
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
-
|
| 9 |
-
|
|
|
|
| 10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
## Introduction
|
| 12 |
-
Here is the checkpoint used in the paper **AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play**. It uses `Qwen-2.5-Instruct-32B` as the backbone.
|
|
|
|
|
|
|
| 13 |
|
| 14 |
## Model Usage
|
| 15 |
For question decomposition on QA tasks:
|
|
@@ -109,14 +119,35 @@ Wrap your answer with <answer> and </answer> tags."""
|
|
| 109 |
|
| 110 |
For Decomposition for document-level financial reasoning tasks:
|
| 111 |
```
|
| 112 |
-
decompose_prompt = """You have the following passages and table:
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
-
qa_prompt = """You have the following passages and table:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
|
| 117 |
question = "What would the change in furniture and fixtures between 2018 and 2019 be if furniture and fixtures were $5,000 thousand in 2018 instead? (in thousand)"
|
| 118 |
|
| 119 |
-
context_text = "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
decompose_prompt = decompose_prompt.replace("{passage}" , context_text)
|
| 122 |
decompose_prompt = decompose_prompt.replace("{question}", question)
|
|
@@ -132,10 +163,13 @@ prompt = tokenizer.apply_chat_template(message, tokenize=False, add_generation_p
|
|
| 132 |
output = llm.generate(prompt, sampling_params)[0].outputs[0].text
|
| 133 |
```
|
| 134 |
|
|
|
|
|
|
|
|
|
|
| 135 |
## Citation
|
| 136 |
If you find our paper or models helpful, please consider cite as follows. Thank you!
|
| 137 |
|
| 138 |
-
```
|
| 139 |
@inproceedings{
|
| 140 |
xu2025acesearcher,
|
| 141 |
title={AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play},
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2.5-32B-Instruct
|
| 4 |
datasets:
|
| 5 |
- AceSearcher/Search-SFT
|
| 6 |
- AceSearcher/Search-RFT-Prompts
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
+
license: mit
|
| 10 |
+
pipeline_tag: text-generation
|
| 11 |
+
library_name: transformers
|
| 12 |
---
|
| 13 |
+
|
| 14 |
+
# AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
|
| 15 |
+
|
| 16 |
+
## Abstract
|
| 17 |
+
Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the DeepSeek-V3 model using less than 5% of its parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9x more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks. Our code will be published at this https URL and this https URL .
|
| 18 |
+
|
| 19 |
## Introduction
|
| 20 |
+
Here is the checkpoint used in the paper **[AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play](https://huggingface.co/papers/2509.24193)**. It uses `Qwen-2.5-Instruct-32B` as the backbone.
|
| 21 |
+
|
| 22 |
+
Code Repository: [https://github.com/ritaranx/AceSearcher/](https://github.com/ritaranx/AceSearcher/)
|
| 23 |
|
| 24 |
## Model Usage
|
| 25 |
For question decomposition on QA tasks:
|
|
|
|
| 119 |
|
| 120 |
For Decomposition for document-level financial reasoning tasks:
|
| 121 |
```
|
| 122 |
+
decompose_prompt = """You have the following passages and table:
|
| 123 |
+
Passages:
|
| 124 |
+
{passage}
|
| 125 |
+
Please break down the question '{question}' into multiple specific sub-questions that address individual components of the original question, with the table and passages as the reference. Use ### to mark the start of each sub-question."""
|
| 126 |
|
| 127 |
+
qa_prompt = """You have the following passages and table:
|
| 128 |
+
Passages:
|
| 129 |
+
{passage}
|
| 130 |
+
For the question '{question}', here is a referenced breakdown:
|
| 131 |
+
{decompose}.
|
| 132 |
+
|
| 133 |
+
Write a Python program to solve the question. Store the final result in the variable ans."""
|
| 134 |
|
| 135 |
|
| 136 |
question = "What would the change in furniture and fixtures between 2018 and 2019 be if furniture and fixtures were $5,000 thousand in 2018 instead? (in thousand)"
|
| 137 |
|
| 138 |
+
context_text = "
|
| 139 |
+
|||December 31,||
|
| 140 |
+
||Useful Life|2019|2018|
|
| 141 |
+
|Computer equipment and software|3 \u2013 5 years|$57,474|$52,055|
|
| 142 |
+
|Furniture and fixtures|7 years|6,096|4,367|
|
| 143 |
+
|Leasehold improvements|2 \u2013 6 years|22,800|9,987|
|
| 144 |
+
|Renovation in progress|n/a|8|1,984|
|
| 145 |
+
|Build-to-suit property|25 years|\u2014|51,058|
|
| 146 |
+
|Total property and equipment, gross||86,378|119,451|
|
| 147 |
+
|Less: accumulated depreciation and amortization||(49,852)|(42,197)|
|
| 148 |
+
|Total property and equipment, net||$36,526|$77,254|
|
| 149 |
+
7. OTHER BALANCE SHEET AMOUNTS The components of property and equipment, net is as follows (in thousands): Depreciation expense for the years ended December 31, 2019, 2018, and 2017 was $11.8 million, $10.2 million, and $10.3 million, respectively.
|
| 150 |
+
"
|
| 151 |
|
| 152 |
decompose_prompt = decompose_prompt.replace("{passage}" , context_text)
|
| 153 |
decompose_prompt = decompose_prompt.replace("{question}", question)
|
|
|
|
| 163 |
output = llm.generate(prompt, sampling_params)[0].outputs[0].text
|
| 164 |
```
|
| 165 |
|
| 166 |
+
## Training
|
| 167 |
+
We use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory/) codebase for both SFT and RFT (mDPO) finetuning. Please see `config` folder for the example configs used.
|
| 168 |
+
|
| 169 |
## Citation
|
| 170 |
If you find our paper or models helpful, please consider cite as follows. Thank you!
|
| 171 |
|
| 172 |
+
```bibtex
|
| 173 |
@inproceedings{
|
| 174 |
xu2025acesearcher,
|
| 175 |
title={AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play},
|