File size: 5,563 Bytes
ecb1909
 
779cc7a
ecb1909
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
779cc7a
ecb1909
 
 
 
 
 
 
 
d59067b
 
 
 
 
 
 
 
ecb1909
 
 
 
 
779cc7a
ecb1909
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
779cc7a
ecb1909
779cc7a
ecb1909
 
 
 
 
 
 
 
779cc7a
ecb1909
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: apache-2.0
base_model: Qwen/Qwen3-4B
tags:
- boolean-queries
- systematic-review
- information-retrieval
- pubmed
- reinforcement-learning
- grpo
library_name: transformers
---

# AutoBool-Qwen4b-No-reasoning

This model is part of the **AutoBool** framework, a reinforcement learning approach for training large language models to generate high-quality Boolean queries for systematic literature reviews.

## Model Description

This variant uses **direct generation** without explicit reasoning steps. The model is instructed to output only the final Boolean query inside `<answer></answer>` tags without any explanation or reasoning process.

- **Base Model:** Qwen/Qwen3-4B
- **Training Method:** GRPO (Group Relative Policy Optimization) with LoRA fine-tuning
- **Prompt Strategy:** Direct generation (no reasoning)
  - System instruction: "Do not include any explanation or reasoning"
  - Output format: `<answer>[Boolean query]</answer>`
  - No intermediate thinking or explanation steps
- **Domain:** Biomedical literature search (PubMed)
- **Task:** Boolean query generation for high-recall retrieval

## 🚀 Interactive Demo

Try out our query generation models directly in your browser! The demo allows you to test our different reasoning strategies (Standard, Conceptual, Objective, and No-Reasoning) in real-time.

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/wshuai190/AutoBool-Demo) 
* **Live Demo:** [AutoBool on Hugging Face Spaces](https://huggingface.co/spaces/wshuai190/AutoBool-Demo)


## Training Details

The model was trained using:
- **Optimization:** GRPO (Group Relative Policy Optimization)
- **Fine-tuning:** LoRA (Low-Rank Adaptation)
- **Dataset:** wshuai190/pubmed-pmc-sr-filtered
- **Reward Function:** Combines syntactic validity, format correctness, and retrieval effectiveness

## Intended Use

This model is designed for:
- Generating Boolean queries for systematic literature reviews
- High-recall biomedical information retrieval
- Supporting evidence synthesis in healthcare and biomedical research

## How to Use

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ielabgroup/Autobool-Qwen4b-No-reasoning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Define your systematic review topic
topic = "Thromboelastography (TEG) and rotational thromboelastometry (ROTEM) for trauma-induced coagulopathy"

# Construct the prompt with system and user messages
messages = [
    {"role": "system", "content": "You are an expert systematic review information specialist.
You are tasked to formulate a systematic review Boolean query in response to a research topic. The final Boolean query must be enclosed within <answer> </answer> tags. Do not include any explanation or reasoning."},
    {"role": "user", "content": f'You are given a systematic review research topic, with the topic title "{topic}".
Your task is to formulate a highly effective Boolean query in MEDLINE format for PubMed.
The query should balance **high recall** (capturing all relevant studies) with **reasonable precision** (avoiding irrelevant results):
- Use both free-text terms and MeSH terms (e.g., chronic pain[tiab], Pain[mh]).
- **Do not wrap terms or phrases in double quotes**, as this disables automatic term mapping (ATM).
- Combine synonyms or related terms within a concept using OR.
- Combine different concepts using AND.
- Use wildcards (*) to capture word variants (e.g., vaccin* → vaccine, vaccination):
  - Terms must have ≥4 characters before the * (e.g., colo*)
  - Wildcards work with field tags (e.g., breastfeed*[tiab]).
- Field tags limit the search to specific fields and disable ATM.
- Do not include date limits.
- Tag term using term field (e.g., covid-19[ti] vaccine[ti] children[ti]) when needed.
**Only use the following allowed field tags:**
Title: [ti], Abstract: [ab], Title/Abstract: [tiab]
MeSH: [mh], Major MeSH: [majr], Supplementary Concept: [nm]
Text Words: [tw], All Fields: [all]
Publication Type: [pt], Language: [la]

Output and only output the formulated Boolean query inside <answer></answer> tags. Do not include any explanation or content outside or inside the <answer> tags.'}
]

# Generate the query
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=2048)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract the query from <answer> tags
import re
match = re.search(r'<answer>(.*?)</answer>', response, re.DOTALL)
if match:
    query = match.group(1).strip()
    print(query)
```

## Limitations

- Optimized specifically for PubMed Boolean query syntax
- Performance may vary on non-biomedical domains
- Requires domain knowledge for effective prompt engineering

## Citation

If you use this model, please cite:

```bibtex
@inproceedings{autobool2026,
  title={AutoBool: Reinforcement Learning for Boolean Query Generation in Systematic Reviews},
  author={[Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon]},
  booktitle={Proceedings of the 2026 Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
  year={2025}
}
```

## More Information

- **GitHub Repository:** [https://github.com/ielab/AutoBool](https://github.com/ielab/AutoBool)
- **Paper:** Accepted at EACL 2026

## License

Apache 2.0