File size: 5,563 Bytes
ecb1909 779cc7a ecb1909 779cc7a ecb1909 d59067b ecb1909 779cc7a ecb1909 779cc7a ecb1909 779cc7a ecb1909 779cc7a ecb1909 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | ---
license: apache-2.0
base_model: Qwen/Qwen3-4B
tags:
- boolean-queries
- systematic-review
- information-retrieval
- pubmed
- reinforcement-learning
- grpo
library_name: transformers
---
# AutoBool-Qwen4b-No-reasoning
This model is part of the **AutoBool** framework, a reinforcement learning approach for training large language models to generate high-quality Boolean queries for systematic literature reviews.
## Model Description
This variant uses **direct generation** without explicit reasoning steps. The model is instructed to output only the final Boolean query inside `<answer></answer>` tags without any explanation or reasoning process.
- **Base Model:** Qwen/Qwen3-4B
- **Training Method:** GRPO (Group Relative Policy Optimization) with LoRA fine-tuning
- **Prompt Strategy:** Direct generation (no reasoning)
- System instruction: "Do not include any explanation or reasoning"
- Output format: `<answer>[Boolean query]</answer>`
- No intermediate thinking or explanation steps
- **Domain:** Biomedical literature search (PubMed)
- **Task:** Boolean query generation for high-recall retrieval
## 🚀 Interactive Demo
Try out our query generation models directly in your browser! The demo allows you to test our different reasoning strategies (Standard, Conceptual, Objective, and No-Reasoning) in real-time.
[](https://huggingface.co/spaces/wshuai190/AutoBool-Demo)
* **Live Demo:** [AutoBool on Hugging Face Spaces](https://huggingface.co/spaces/wshuai190/AutoBool-Demo)
## Training Details
The model was trained using:
- **Optimization:** GRPO (Group Relative Policy Optimization)
- **Fine-tuning:** LoRA (Low-Rank Adaptation)
- **Dataset:** wshuai190/pubmed-pmc-sr-filtered
- **Reward Function:** Combines syntactic validity, format correctness, and retrieval effectiveness
## Intended Use
This model is designed for:
- Generating Boolean queries for systematic literature reviews
- High-recall biomedical information retrieval
- Supporting evidence synthesis in healthcare and biomedical research
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "ielabgroup/Autobool-Qwen4b-No-reasoning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Define your systematic review topic
topic = "Thromboelastography (TEG) and rotational thromboelastometry (ROTEM) for trauma-induced coagulopathy"
# Construct the prompt with system and user messages
messages = [
{"role": "system", "content": "You are an expert systematic review information specialist.
You are tasked to formulate a systematic review Boolean query in response to a research topic. The final Boolean query must be enclosed within <answer> </answer> tags. Do not include any explanation or reasoning."},
{"role": "user", "content": f'You are given a systematic review research topic, with the topic title "{topic}".
Your task is to formulate a highly effective Boolean query in MEDLINE format for PubMed.
The query should balance **high recall** (capturing all relevant studies) with **reasonable precision** (avoiding irrelevant results):
- Use both free-text terms and MeSH terms (e.g., chronic pain[tiab], Pain[mh]).
- **Do not wrap terms or phrases in double quotes**, as this disables automatic term mapping (ATM).
- Combine synonyms or related terms within a concept using OR.
- Combine different concepts using AND.
- Use wildcards (*) to capture word variants (e.g., vaccin* → vaccine, vaccination):
- Terms must have ≥4 characters before the * (e.g., colo*)
- Wildcards work with field tags (e.g., breastfeed*[tiab]).
- Field tags limit the search to specific fields and disable ATM.
- Do not include date limits.
- Tag term using term field (e.g., covid-19[ti] vaccine[ti] children[ti]) when needed.
**Only use the following allowed field tags:**
Title: [ti], Abstract: [ab], Title/Abstract: [tiab]
MeSH: [mh], Major MeSH: [majr], Supplementary Concept: [nm]
Text Words: [tw], All Fields: [all]
Publication Type: [pt], Language: [la]
Output and only output the formulated Boolean query inside <answer></answer> tags. Do not include any explanation or content outside or inside the <answer> tags.'}
]
# Generate the query
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=2048)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract the query from <answer> tags
import re
match = re.search(r'<answer>(.*?)</answer>', response, re.DOTALL)
if match:
query = match.group(1).strip()
print(query)
```
## Limitations
- Optimized specifically for PubMed Boolean query syntax
- Performance may vary on non-biomedical domains
- Requires domain knowledge for effective prompt engineering
## Citation
If you use this model, please cite:
```bibtex
@inproceedings{autobool2026,
title={AutoBool: Reinforcement Learning for Boolean Query Generation in Systematic Reviews},
author={[Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon]},
booktitle={Proceedings of the 2026 Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
year={2025}
}
```
## More Information
- **GitHub Repository:** [https://github.com/ielab/AutoBool](https://github.com/ielab/AutoBool)
- **Paper:** Accepted at EACL 2026
## License
Apache 2.0
|