Text Generation
Transformers
Safetensors
English
Chinese
qwen3
deep-research
react-agent
reinforcement-learning
search-agent
agentic-rl
conversational
text-generation-inference
Instructions to use simplex-ai-inc/LiteResearcher-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use simplex-ai-inc/LiteResearcher-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="simplex-ai-inc/LiteResearcher-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("simplex-ai-inc/LiteResearcher-4B") model = AutoModelForCausalLM.from_pretrained("simplex-ai-inc/LiteResearcher-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use simplex-ai-inc/LiteResearcher-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "simplex-ai-inc/LiteResearcher-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simplex-ai-inc/LiteResearcher-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/simplex-ai-inc/LiteResearcher-4B
- SGLang
How to use simplex-ai-inc/LiteResearcher-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "simplex-ai-inc/LiteResearcher-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simplex-ai-inc/LiteResearcher-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "simplex-ai-inc/LiteResearcher-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simplex-ai-inc/LiteResearcher-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use simplex-ai-inc/LiteResearcher-4B with Docker Model Runner:
docker model run hf.co/simplex-ai-inc/LiteResearcher-4B
File size: 6,221 Bytes
d75a775 ca21951 252ab07 2373c5f d75a775 252ab07 d75a775 252ab07 d75a775 252ab07 d75a775 252ab07 d75a775 a82e93f 0ebc523 d75a775 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | ---
license: apache-2.0
language:
- en
- zh
base_model:
- Qwen/Qwen3-4B-Thinking
tags:
- deep-research
- react-agent
- reinforcement-learning
- search-agent
- agentic-rl
pipeline_tag: text-generation
library_name: transformers
---
# LiteResearcher-4B
<p align="center"> <img src="assets/logo.png" alt="LiteResearcher Logo" width="400">
</p>
<p align="center"> <a href="https://simplex-ai-inc.github.io/LiteResearcher/">π Project Page</a> β’
<a href="https://github.com/simplex-ai-inc/LiteResearcher">π» Code</a> β’
<a href="https://arxiv.org/abs/2604.17931">π Paper</a>
</p>
**LiteResearcher-4B** is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches **Claude-4.5-Sonnet** on GAIA and outperforms open-source models up to **8Γ larger**.
## Key Results
| Benchmark | LiteResearcher-4B | Notable Comparison |
|---|---|---|
| **GAIA-Text** | **71.3%** | = Claude-4.5-Sonnet (71.2%) |
| **Xbench-DS** | **78.0%** | > Tongyi DeepSearch 30B (75.0%) |
| **Frames** | **83.1%** | > Claude-4-Sonnet (80.7%) |
| **WebWalkerQA** | **72.7%** | > Tongyi DeepSearch 30B (72.2%) |
All with only **4B parameters** β 8β32Γ smaller than comparable models.
## Model Details
- **Architecture**: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
- **Parameters**: 4B
- **Max Context**: 262,144 tokens
- **Training**: Two-stage difficulty-aware curriculum RL with virtual world environment
- **Agent Mode**: ReAct-style with `search` and `visit` tools
## How It Works
LiteResearcher operates as a ReAct agent that iteratively:
1. **Thinks** about what information is needed
2. **Searches** the web via Google
3. **Visits** webpages to extract evidence
4. **Answers** when sufficient information is gathered
The model uses `<think>`, `<tool_call>`, and `<answer>` tags to structure its reasoning.
## Quick Start
### With the Inference Framework
```bash
git clone https://github.com/simplex-ai-inc/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY
# Start SGLang server
python -m sglang.launch_server \
--model-path simplex-ai-inc/LiteResearcher-4B \
--port 6001 --tp 2
# Run inference
bash scripts/run_all.sh \
--model simplex-ai-inc/LiteResearcher-4B \
--dataset data/example.jsonl
```
### Direct Usage with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "simplex-ai-inc/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a deep research assistant..."},
{"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```
## Training
LiteResearcher is trained with a three-component framework:
1. **Co-constructed Training Data & Corpus** β 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
2. **Stable Local Tool Environment** β Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
3. **Difficulty-Aware Curriculum RL** β Multi-stage training that progressively increases task difficulty and context length
## Benchmark Results
LiteResearcher-4B consistently outperforms open-source models up to 8Γ larger and matches or exceeds proprietary systems across eight benchmarks.
| Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS |
|---|---|---|---|---|---|---|---|---|---|
| | | | | **Commercial Models** | | | | | |
| Claude-4-Sonnet | - | 68.3 | 12.2 | 29.1 | 20.3 | 80.7 | 61.7 | - | 64.6 |
| Claude-4.5-Sonnet | - | 71.2 | 19.6 | 40.8 | 24.5 | 85.0 | - | 53.4 | 66.0 |
| DeepSeek-V3.2 | - | 63.5 | 67.6 | 65.0 | 40.8 | 80.2 | - | 38.5 | 71.0 |
| DeepSeek-V3.1 | - | 63.1 | 30.0 | 49.2 | 29.8 | 83.7 | 61.2 | - | 71.0 |
| Minimax-M2 | - | 75.7 | 44.0 | 48.5 | 31.8 | - | - | - | 72.0 |
| OpenAI-GPT-5-high | - | 76.4 | 54.9 | 65.0 | 35.2 | - | - | 51.4 | 77.8 |
| GLM-4.6 | - | 71.9 | 45.1 | 49.5 | 30.4 | - | - | - | 70.0 |
| Kimi-Researcher | - | - | - | - | 26.9 | 78.8 | - | 36.0 | 69.0 |
| Kimi-K2-0905 | - | 60.2 | 7.4 | 22.2 | 21.7 | 58.1 | - | 25.2 | 61.0 |
| | | | | **Open-Source Models** | | | | | |
| Mirothinker | 8B | 66.4 | 31.1 | 40.2 | 21.5 | 80.6 | 60.6 | 40.4 | 60.6 |
| Tongyi DeepSearch | 30B | 70.9 | 43.4 | 46.7 | 32.9 | **90.6** | 72.2 | - | 75.0 |
| ASearcher QWQ v2 | 32B | 58.7 | - | - | - | 74.5 | - | - | 51.1 |
| WebSailor | 30B | 53.2 | - | - | - | - | - | - | 53.3 |
| WebDancer (QwQ) | 32B | 51.5 | 3.8 | 18.0 | - | - | 47.9 | - | 38.3 |
| WebExplorer | 8B | 50.0 | 15.7 | 32.0 | 17.3 | 75.7 | 62.7 | - | 53.7 |
| DeepMiner | 32B | 58.7 | 33.5 | 40.1 | - | - | - | - | 62.0 |
| AFM-RL | 32B | 55.3 | 11.1 | - | 18.0 | - | 63.0 | - | - |
| SFR-DeepResearch | 20B | 66.0 | - | - | 28.7 | 82.8 | - | - | - |
| AgentCPM-Explore | 4B | 63.9 | 24.1 | 29.1 | 19.1 | 82.7 | 68.1 | 40.5 | 70.0 |
| **LiteResearcher** | **4B** | **71.3** | 27.5\* | 32.5\* | 22.0 | 83.1 | **72.7** | **41.8** | **78.0** |
Best open-source results in **bold**. Results with \* use a 64k context window with a memory mechanism.
## Citation
```bibtex
@article{li2026literesearcher,
title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
journal={arXiv preprint arXiv:2604.17931},
year={2026}
}
```
## License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|