Text Generation
Transformers
Safetensors
English
Chinese
qwen3
deep-research
react-agent
reinforcement-learning
search-agent
agentic-rl
conversational
text-generation-inference
Instructions to use simplex-ai-inc/LiteResearcher-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use simplex-ai-inc/LiteResearcher-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="simplex-ai-inc/LiteResearcher-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("simplex-ai-inc/LiteResearcher-4B") model = AutoModelForCausalLM.from_pretrained("simplex-ai-inc/LiteResearcher-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use simplex-ai-inc/LiteResearcher-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "simplex-ai-inc/LiteResearcher-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simplex-ai-inc/LiteResearcher-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/simplex-ai-inc/LiteResearcher-4B
- SGLang
How to use simplex-ai-inc/LiteResearcher-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "simplex-ai-inc/LiteResearcher-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simplex-ai-inc/LiteResearcher-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "simplex-ai-inc/LiteResearcher-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simplex-ai-inc/LiteResearcher-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use simplex-ai-inc/LiteResearcher-4B with Docker Model Runner:
docker model run hf.co/simplex-ai-inc/LiteResearcher-4B
| license: apache-2.0 | |
| language: | |
| - en | |
| - zh | |
| base_model: | |
| - Qwen/Qwen3-4B-Thinking | |
| tags: | |
| - deep-research | |
| - react-agent | |
| - reinforcement-learning | |
| - search-agent | |
| - agentic-rl | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # LiteResearcher-4B | |
| <p align="center"> <img src="assets/logo.png" alt="LiteResearcher Logo" width="400"> | |
| </p> | |
| <p align="center"> <a href="https://simplex-ai-inc.github.io/LiteResearcher/">π Project Page</a> β’ | |
| <a href="https://github.com/simplex-ai-inc/LiteResearcher">π» Code</a> β’ | |
| <a href="https://arxiv.org/abs/2604.17931">π Paper</a> | |
| </p> | |
| **LiteResearcher-4B** is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches **Claude-4.5-Sonnet** on GAIA and outperforms open-source models up to **8Γ larger**. | |
| ## Key Results | |
| | Benchmark | LiteResearcher-4B | Notable Comparison | | |
| |---|---|---| | |
| | **GAIA-Text** | **71.3%** | = Claude-4.5-Sonnet (71.2%) | | |
| | **Xbench-DS** | **78.0%** | > Tongyi DeepSearch 30B (75.0%) | | |
| | **Frames** | **83.1%** | > Claude-4-Sonnet (80.7%) | | |
| | **WebWalkerQA** | **72.7%** | > Tongyi DeepSearch 30B (72.2%) | | |
| All with only **4B parameters** β 8β32Γ smaller than comparable models. | |
| ## Model Details | |
| - **Architecture**: Qwen3ForCausalLM (Qwen3-4B-Thinking base) | |
| - **Parameters**: 4B | |
| - **Max Context**: 262,144 tokens | |
| - **Training**: Two-stage difficulty-aware curriculum RL with virtual world environment | |
| - **Agent Mode**: ReAct-style with `search` and `visit` tools | |
| ## How It Works | |
| LiteResearcher operates as a ReAct agent that iteratively: | |
| 1. **Thinks** about what information is needed | |
| 2. **Searches** the web via Google | |
| 3. **Visits** webpages to extract evidence | |
| 4. **Answers** when sufficient information is gathered | |
| The model uses `<think>`, `<tool_call>`, and `<answer>` tags to structure its reasoning. | |
| ## Quick Start | |
| ### With the Inference Framework | |
| ```bash | |
| git clone https://github.com/simplex-ai-inc/LiteResearcher.git | |
| cd LiteResearcher | |
| pip install -r requirements.txt | |
| # Configure API keys | |
| cp .env.example .env | |
| # Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY | |
| # Start SGLang server | |
| python -m sglang.launch_server \ | |
| --model-path simplex-ai-inc/LiteResearcher-4B \ | |
| --port 6001 --tp 2 | |
| # Run inference | |
| bash scripts/run_all.sh \ | |
| --model simplex-ai-inc/LiteResearcher-4B \ | |
| --dataset data/example.jsonl | |
| ``` | |
| ### Direct Usage with Transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "simplex-ai-inc/LiteResearcher-4B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") | |
| messages = [ | |
| {"role": "system", "content": "You are a deep research assistant..."}, | |
| {"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"} | |
| ] | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95) | |
| print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| ## Training | |
| LiteResearcher is trained with a three-component framework: | |
| 1. **Co-constructed Training Data & Corpus** β 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics) | |
| 2. **Stable Local Tool Environment** β Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost | |
| 3. **Difficulty-Aware Curriculum RL** β Multi-stage training that progressively increases task difficulty and context length | |
| ## Benchmark Results | |
| LiteResearcher-4B consistently outperforms open-source models up to 8Γ larger and matches or exceeds proprietary systems across eight benchmarks. | |
| | Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS | | |
| |---|---|---|---|---|---|---|---|---|---| | |
| | | | | | **Commercial Models** | | | | | | | |
| | Claude-4-Sonnet | - | 68.3 | 12.2 | 29.1 | 20.3 | 80.7 | 61.7 | - | 64.6 | | |
| | Claude-4.5-Sonnet | - | 71.2 | 19.6 | 40.8 | 24.5 | 85.0 | - | 53.4 | 66.0 | | |
| | DeepSeek-V3.2 | - | 63.5 | 67.6 | 65.0 | 40.8 | 80.2 | - | 38.5 | 71.0 | | |
| | DeepSeek-V3.1 | - | 63.1 | 30.0 | 49.2 | 29.8 | 83.7 | 61.2 | - | 71.0 | | |
| | Minimax-M2 | - | 75.7 | 44.0 | 48.5 | 31.8 | - | - | - | 72.0 | | |
| | OpenAI-GPT-5-high | - | 76.4 | 54.9 | 65.0 | 35.2 | - | - | 51.4 | 77.8 | | |
| | GLM-4.6 | - | 71.9 | 45.1 | 49.5 | 30.4 | - | - | - | 70.0 | | |
| | Kimi-Researcher | - | - | - | - | 26.9 | 78.8 | - | 36.0 | 69.0 | | |
| | Kimi-K2-0905 | - | 60.2 | 7.4 | 22.2 | 21.7 | 58.1 | - | 25.2 | 61.0 | | |
| | | | | | **Open-Source Models** | | | | | | | |
| | Mirothinker | 8B | 66.4 | 31.1 | 40.2 | 21.5 | 80.6 | 60.6 | 40.4 | 60.6 | | |
| | Tongyi DeepSearch | 30B | 70.9 | 43.4 | 46.7 | 32.9 | **90.6** | 72.2 | - | 75.0 | | |
| | ASearcher QWQ v2 | 32B | 58.7 | - | - | - | 74.5 | - | - | 51.1 | | |
| | WebSailor | 30B | 53.2 | - | - | - | - | - | - | 53.3 | | |
| | WebDancer (QwQ) | 32B | 51.5 | 3.8 | 18.0 | - | - | 47.9 | - | 38.3 | | |
| | WebExplorer | 8B | 50.0 | 15.7 | 32.0 | 17.3 | 75.7 | 62.7 | - | 53.7 | | |
| | DeepMiner | 32B | 58.7 | 33.5 | 40.1 | - | - | - | - | 62.0 | | |
| | AFM-RL | 32B | 55.3 | 11.1 | - | 18.0 | - | 63.0 | - | - | | |
| | SFR-DeepResearch | 20B | 66.0 | - | - | 28.7 | 82.8 | - | - | - | | |
| | AgentCPM-Explore | 4B | 63.9 | 24.1 | 29.1 | 19.1 | 82.7 | 68.1 | 40.5 | 70.0 | | |
| | **LiteResearcher** | **4B** | **71.3** | 27.5\* | 32.5\* | 22.0 | 83.1 | **72.7** | **41.8** | **78.0** | | |
| Best open-source results in **bold**. Results with \* use a 64k context window with a memory mechanism. | |
| ## Citation | |
| ```bibtex | |
| @article{li2026literesearcher, | |
| title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent}, | |
| author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang}, | |
| journal={arXiv preprint arXiv:2604.17931}, | |
| year={2026} | |
| } | |
| ``` | |
| ## License | |
| This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). | |