Instructions to use simplex-ai-inc/LiteResearcher-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use simplex-ai-inc/LiteResearcher-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="simplex-ai-inc/LiteResearcher-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("simplex-ai-inc/LiteResearcher-4B")
model = AutoModelForCausalLM.from_pretrained("simplex-ai-inc/LiteResearcher-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use simplex-ai-inc/LiteResearcher-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "simplex-ai-inc/LiteResearcher-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "simplex-ai-inc/LiteResearcher-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/simplex-ai-inc/LiteResearcher-4B

SGLang

How to use simplex-ai-inc/LiteResearcher-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "simplex-ai-inc/LiteResearcher-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "simplex-ai-inc/LiteResearcher-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "simplex-ai-inc/LiteResearcher-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "simplex-ai-inc/LiteResearcher-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use simplex-ai-inc/LiteResearcher-4B with Docker Model Runner:
```
docker model run hf.co/simplex-ai-inc/LiteResearcher-4B
```

LiteResearcher-4B / README.md

wanlilll

fix bibtex: use first-name-first author format

a82e93f verified 28 days ago

preview code

raw

history blame contribute delete

6.22 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- Qwen/Qwen3-4B-Thinking
	tags:
	- deep-research
	- react-agent
	- reinforcement-learning
	- search-agent
	- agentic-rl
	pipeline_tag: text-generation
	library_name: transformers
	---

	# LiteResearcher-4B

	<p align="center"> <img src="assets/logo.png" alt="LiteResearcher Logo" width="400">
	</p>

	<p align="center"> <a href="https://simplex-ai-inc.github.io/LiteResearcher/">🌐 Project Page</a> •
	<a href="https://github.com/simplex-ai-inc/LiteResearcher">💻 Code</a> •
	<a href="https://arxiv.org/abs/2604.17931">📄 Paper</a>
	</p>

	LiteResearcher-4B is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches Claude-4.5-Sonnet on GAIA and outperforms open-source models up to 8× larger.

	## Key Results

	\| Benchmark \| LiteResearcher-4B \| Notable Comparison \|
	\|---\|---\|---\|
	\| GAIA-Text \| 71.3% \| = Claude-4.5-Sonnet (71.2%) \|
	\| Xbench-DS \| 78.0% \| > Tongyi DeepSearch 30B (75.0%) \|
	\| Frames \| 83.1% \| > Claude-4-Sonnet (80.7%) \|
	\| WebWalkerQA \| 72.7% \| > Tongyi DeepSearch 30B (72.2%) \|

	All with only 4B parameters — 8–32× smaller than comparable models.

	## Model Details

	- Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
	- Parameters: 4B
	- Max Context: 262,144 tokens
	- Training: Two-stage difficulty-aware curriculum RL with virtual world environment
	- Agent Mode: ReAct-style with `search` and `visit` tools

	## How It Works

	LiteResearcher operates as a ReAct agent that iteratively:
	1. Thinks about what information is needed
	2. Searches the web via Google
	3. Visits webpages to extract evidence
	4. Answers when sufficient information is gathered

	The model uses `<think>`, `<tool_call>`, and `<answer>` tags to structure its reasoning.

	## Quick Start

	### With the Inference Framework

	```bash
	git clone https://github.com/simplex-ai-inc/LiteResearcher.git
	cd LiteResearcher
	pip install -r requirements.txt

	# Configure API keys
	cp .env.example .env
	# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY

	# Start SGLang server
	python -m sglang.launch_server \
	--model-path simplex-ai-inc/LiteResearcher-4B \
	--port 6001 --tp 2

	# Run inference
	bash scripts/run_all.sh \
	--model simplex-ai-inc/LiteResearcher-4B \
	--dataset data/example.jsonl
	```

	### Direct Usage with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "simplex-ai-inc/LiteResearcher-4B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

	messages = [
	{"role": "system", "content": "You are a deep research assistant..."},
	{"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer([text], return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
	print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
	```

	## Training

	LiteResearcher is trained with a three-component framework:

	1. Co-constructed Training Data & Corpus — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
	2. Stable Local Tool Environment — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
	3. Difficulty-Aware Curriculum RL — Multi-stage training that progressively increases task difficulty and context length

	## Benchmark Results

	LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.

	\| Model \| Size \| GAIA \| BrowseComp (en) \| BrowseComp (zh) \| Humanity \| Frames \| WebWalkerQA \| MAIA \| Xbench-DS \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| \| \| \| \| Commercial Models \| \| \| \| \| \|
	\| Claude-4-Sonnet \| - \| 68.3 \| 12.2 \| 29.1 \| 20.3 \| 80.7 \| 61.7 \| - \| 64.6 \|
	\| Claude-4.5-Sonnet \| - \| 71.2 \| 19.6 \| 40.8 \| 24.5 \| 85.0 \| - \| 53.4 \| 66.0 \|
	\| DeepSeek-V3.2 \| - \| 63.5 \| 67.6 \| 65.0 \| 40.8 \| 80.2 \| - \| 38.5 \| 71.0 \|
	\| DeepSeek-V3.1 \| - \| 63.1 \| 30.0 \| 49.2 \| 29.8 \| 83.7 \| 61.2 \| - \| 71.0 \|
	\| Minimax-M2 \| - \| 75.7 \| 44.0 \| 48.5 \| 31.8 \| - \| - \| - \| 72.0 \|
	\| OpenAI-GPT-5-high \| - \| 76.4 \| 54.9 \| 65.0 \| 35.2 \| - \| - \| 51.4 \| 77.8 \|
	\| GLM-4.6 \| - \| 71.9 \| 45.1 \| 49.5 \| 30.4 \| - \| - \| - \| 70.0 \|
	\| Kimi-Researcher \| - \| - \| - \| - \| 26.9 \| 78.8 \| - \| 36.0 \| 69.0 \|
	\| Kimi-K2-0905 \| - \| 60.2 \| 7.4 \| 22.2 \| 21.7 \| 58.1 \| - \| 25.2 \| 61.0 \|
	\| \| \| \| \| Open-Source Models \| \| \| \| \| \|
	\| Mirothinker \| 8B \| 66.4 \| 31.1 \| 40.2 \| 21.5 \| 80.6 \| 60.6 \| 40.4 \| 60.6 \|
	\| Tongyi DeepSearch \| 30B \| 70.9 \| 43.4 \| 46.7 \| 32.9 \| 90.6 \| 72.2 \| - \| 75.0 \|
	\| ASearcher QWQ v2 \| 32B \| 58.7 \| - \| - \| - \| 74.5 \| - \| - \| 51.1 \|
	\| WebSailor \| 30B \| 53.2 \| - \| - \| - \| - \| - \| - \| 53.3 \|
	\| WebDancer (QwQ) \| 32B \| 51.5 \| 3.8 \| 18.0 \| - \| - \| 47.9 \| - \| 38.3 \|
	\| WebExplorer \| 8B \| 50.0 \| 15.7 \| 32.0 \| 17.3 \| 75.7 \| 62.7 \| - \| 53.7 \|
	\| DeepMiner \| 32B \| 58.7 \| 33.5 \| 40.1 \| - \| - \| - \| - \| 62.0 \|
	\| AFM-RL \| 32B \| 55.3 \| 11.1 \| - \| 18.0 \| - \| 63.0 \| - \| - \|
	\| SFR-DeepResearch \| 20B \| 66.0 \| - \| - \| 28.7 \| 82.8 \| - \| - \| - \|
	\| AgentCPM-Explore \| 4B \| 63.9 \| 24.1 \| 29.1 \| 19.1 \| 82.7 \| 68.1 \| 40.5 \| 70.0 \|
	\| LiteResearcher \| 4B \| 71.3 \| 27.5\* \| 32.5\* \| 22.0 \| 83.1 \| 72.7 \| 41.8 \| 78.0 \|

	Best open-source results in bold. Results with \* use a 64k context window with a memory mechanism.

	## Citation

	```bibtex
	@article{li2026literesearcher,
	title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
	author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
	journal={arXiv preprint arXiv:2604.17931},
	year={2026}
	}
	```

	## License

	This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).