File size: 8,220 Bytes
816198f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | [δΈζ](./README.md) | English
# S1-DeepResearch Inference Framework
## Key Features
- **Multiple LLM clients**: Supports vLLM, Azure OpenAI, AIHubMix, and other LLM services
- **Rich toolset**: Nine tools covering search, web browsing, file parsing, code execution, multimodal Q&A, bash, and more
- **Batch inference**: Concurrent batch inference with resume-from-checkpoint and periodic result saving
- **Single-query inference**: Detailed debugging and testing for individual queries
- **Load balancing**: Multi-node LLM load balancing and consistent scheduling
- **Detailed logging**: Per-query log files for easier troubleshooting and analysis
## Project Layout (current)
```text
./
βββ run_batch_inference_demo.sh # Local / vLLM script template
βββ run_batch_inference_online_demo.sh # Online platform script template
βββ inference/
β βββ run_batch_inference.py
β βββ run_single_inference.py
βββ server/
βββ tool_kits/
βββ utils/
β βββ config/
β βββ config.example.json
β βββ README.md
βββ models/tokenizer/
βββ test_all_tools.py
```
## Quick Start
### 1. Install dependencies
```bash
pip install -r requirements.txt
```
### 2. Configuration (JSON or environment variables recommended)
Precedence: **custom JSON > environment variables > defaults in `utils/config.py`**.
Typical workflow:
```bash
cp utils/config/config.example.json utils/config/config.local.json
```
Edit `config.local.json` as needed, for example:
- `TOOLS_SERVER_BASE_ENDPOINT_URL`
- `AIHUBMIX_KEY` / `AZURE_KEY` / `VOLCANO_KEY` / `ALIYUN_KEY`
- `CLIENT_TIMEOUT`
You can also override via environment variables, for example:
```bash
export S1_DR_CONFIG_JSON="utils/config/config.local.json"
```
### 3. Prepare input JSONL
Each line is one JSON object. At minimum include `question`; usually also `id` and `file_path`.
#### 3.1 JSONL example (file inputs)
```json
{"id":"query_001","question":"When Alibaba was founded, what was the average age of the founders whose surnames are Ma, Cai, or Zhang among the 18 co-founders? Round to one decimal place.","file_path":[]}
{"id":"query_002","question":"According to the manual, for DJI's heaviest AIR-series drone by takeoff weight, how many mAh of battery energy remain after flying half a marathon? (Note 1: assume calm air; minimum energy use is flying at 60% of max speed. Note 2: power draw can be converted from max flight time.)","file_path":["/path/to/file.pdf"]}
```
#### 3.2 JSONL example (using Skills)
```json
{"id":"query_003","question":"Use pymatgen to build a simple TiO2 surface slab. Please generate a common low-index surface, report the Miller index, slab thickness, and vacuum size, and briefly describe the resulting surface structure.","skills":[{"name": "skill_name1", "description": "description1", "skill_path": "skill_path1"}, {"name": "skill_name2", "description": "description2", "skill_path": "skill_path2"}]}
```
## Recommended workflow: copy a script, then run
### A. Local / vLLM (`run_batch_inference_demo.sh`)
```bash
cp run_batch_inference_demo.sh run_batch_local.sh
mkdir -p run_logs
# Edit parameters inside run_batch_local.sh
bash run_batch_local.sh
```
Notes:
- The script starts Python with `nohup ... &` and prints the background PID.
- Tail logs: `tail -f run_logs/run.log`
### B. Online platform (`run_batch_inference_online_demo.sh`)
```bash
cp run_batch_inference_online_demo.sh run_batch_online.sh
mkdir -p run_logs
# Edit parameters inside run_batch_online.sh
bash run_batch_online.sh
```
Notes:
- Focus on: `LLM_CLIENT_URLS`, `LLM_CLIENT_MODELS`, `SYSTEM_FORMAT`
- Tail logs: `tail -f run_logs/run_batch_*.log`
## Script parameters
### Basic
- `LLM_CLIENT_URLS`: Model service URLs, space-separated (paired with the model list)
- `LLM_CLIENT_MODELS`: Model names, space-separated
- `TEST_DATA_FILE`: Input JSONL path
- `OUTPUT_FILE`: Output file when `ROLLOUT_NUM=1`
- `OUTPUT_DIR`: Output directory when `ROLLOUT_NUM>1` (e.g. `rollout_01.jsonl`, β¦)
- `ROLLOUT_NUM`: Number of rollouts per sample
- `RESUME_FROM_FILE`: Resume checkpoint file (may be empty)
- `AVAILABLE_TOOLS`: Enabled tools, space-separated
- `TASK_TYPE`: Whether to treat input as text-only; default `input_only`
### Inference control
- `MAX_ROUNDS`: Max rounds per query
- `CONCURRENCY_WORKERS`: Number of concurrent workers
- `SAVE_BATCH_SIZE`: Flush results to disk every N samples
- `TEMPERATURE`: Sampling temperature
- `TOP_P`: Top-p (included in `run_batch_inference_demo.sh`)
- `EXTRA_PAYLOAD`: Extra model payload (JSON string; included in `run_batch_inference_demo.sh`)
- `TIMEOUT_FOR_ONE_QUERY`: Per-query timeout (seconds)
- `LLM_API_RETRY_TIMES`: Retries after LLM failure (not counting the first attempt)
- `SYSTEM_PROMPT`: Custom system prompt; empty uses the built-in default
- `SYSTEM_FORMAT`: Platform format (mainly in `run_batch_inference_online_demo.sh`)
### Context truncation
- `DISCARD_ALL_MODE`: Enable discard-all (`true`/`false`)
- `MODEL_MAX_CONTEXT_TOKENS`: Model max context length
- `DISCARD_RATIO`: Threshold ratio to trigger discard
- `TOKENIZER_PATH`: Path to tokenizer used for token counting
### Logging
- `LOG_LABEL`: Log label; directory shape `logs/YYYY_MM_DD_<LOG_LABEL>/`
- `LOG_FILE`: Script log file under `run_logs/*.log`
- `LOGGING_ROOT`: Log root (set in `run_batch_inference_demo.sh`; may be empty)
## `SYSTEM_FORMAT` values
`SYSTEM_FORMAT` selects platform-specific handling via keyword branches.
- `deep_research`: Local deep-research format (vLLM deployment)
- `azure`: Azure OpenAI
- `aihubmix`: AIHubMix (OpenAI-compatible)
- `aihubmix_claude`: AIHubMix Claude format
- `aihubmix_glm`: AIHubMix GLM format
- `volcano`: Volcano Engine
- `aliyun`: Alibaba Cloud Bailian format
## Currently available tools (9)
- `wide_search`: General web search via Serp; multiple queries in one round
- `scholar_search`: Google Scholar academic search (+ web results)
- `image_search`: Image search; multiple queries supported
- `wide_visit`: Visit pages and summarize toward a `goal`
- `file_wide_parse`: Parse local/remote files (PDF, DOCX, MD, CSV, etc.)
- `execute_code`: Run Python code
- `ask_question_about_image`: Image understanding and Q&A
- `ask_question_about_video`: Video understanding and Q&A
- `bash`: Run shell commands
Tool schemas are defined in `DEEPRESEARCH_SYSTEM_PROMPT` in `utils/prompts.py`.
## Outputs and logs
### Output JSONL fields
Each line written by `run_batch_inference.py` contains:
- `time_stamp`: Write time for that row (`YYYY-MM-DD HH:MM:SS`).
- `query_id`: Batch-level query id (hash of `question`).
- `query`: This rowβs `question` text.
- `result`: Detailed result object for one segment (from `run_single_inference.py`).
- `status`: `success` / `timeout` / `error`.
- `discard_segments`: Segments truncated by discard-all and summarized (excluding the final segment).
- `elapsed_sec`: Total seconds for this rollout of the query.
- `rollout_idx`: Rollout index (1-based).
- `src`: Full original input line (often includes `id`, `question`, `file_path`, skills, etc.).
- `segment_idx`: Current segment index (1-based).
- `segment_total`: Total segments for this query; `0` if there is no valid `result`.
Common fields inside `result` (`run_single_inference.py`):
- `query_id`: Single-run instance id (includes a time suffix).
- `tools`: Enabled tool schemas (string form).
- `messages`: Messages for model reasoning and tool interaction.
- `final_answer`: Answer text for this segment.
- `transcript`: Fuller trajectory (including tool returns).
- `rounds`: Rounds executed in this segment.
- `stopped_reason`: Why it stopped (e.g. `no_tool_calls`, `discard_all_01`, `discard_all_final`, `max_rounds_exceeded`).
- `error`: Present only on failure.
### Log directories
Default layout when `LOGGING_ROOT` is empty:
```text
logs/
βββ YYYY_MM_DD_<LOG_LABEL>/
βββ collect.log
βββ <query_id>/
βββ run.log
βββ result.json
```
## Tool tests
Run the tool test script:
```bash
python test_all_tools.py
```
This exercises all registered tools and checks that basic behavior works.
|