| [δΈζ](./README.md) | English |
|
|
| # S1-DeepResearch Inference Framework |
|
|
| ## Key Features |
|
|
| - **Multiple LLM clients**: Supports vLLM, Azure OpenAI, AIHubMix, and other LLM services |
| - **Rich toolset**: Nine tools covering search, web browsing, file parsing, code execution, multimodal Q&A, bash, and more |
| - **Batch inference**: Concurrent batch inference with resume-from-checkpoint and periodic result saving |
| - **Single-query inference**: Detailed debugging and testing for individual queries |
| - **Load balancing**: Multi-node LLM load balancing and consistent scheduling |
| - **Detailed logging**: Per-query log files for easier troubleshooting and analysis |
|
|
| ## Project Layout (current) |
|
|
| ```text |
| ./ |
| βββ run_batch_inference_demo.sh # Local / vLLM script template |
| βββ run_batch_inference_online_demo.sh # Online platform script template |
| βββ inference/ |
| β βββ run_batch_inference.py |
| β βββ run_single_inference.py |
| βββ server/ |
| βββ tool_kits/ |
| βββ utils/ |
| β βββ config/ |
| β βββ config.example.json |
| β βββ README.md |
| βββ models/tokenizer/ |
| βββ test_all_tools.py |
| ``` |
|
|
| ## Quick Start |
|
|
| ### 1. Install dependencies |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ### 2. Configuration (JSON or environment variables recommended) |
|
|
| Precedence: **custom JSON > environment variables > defaults in `utils/config.py`**. |
|
|
| Typical workflow: |
|
|
| ```bash |
| cp utils/config/config.example.json utils/config/config.local.json |
| ``` |
|
|
| Edit `config.local.json` as needed, for example: |
|
|
| - `TOOLS_SERVER_BASE_ENDPOINT_URL` |
| - `AIHUBMIX_KEY` / `AZURE_KEY` / `VOLCANO_KEY` / `ALIYUN_KEY` |
| - `CLIENT_TIMEOUT` |
|
|
| You can also override via environment variables, for example: |
|
|
| ```bash |
| export S1_DR_CONFIG_JSON="utils/config/config.local.json" |
| ``` |
|
|
| ### 3. Prepare input JSONL |
|
|
| Each line is one JSON object. At minimum include `question`; usually also `id` and `file_path`. |
|
|
| #### 3.1 JSONL example (file inputs) |
|
|
| ```json |
| {"id":"query_001","question":"When Alibaba was founded, what was the average age of the founders whose surnames are Ma, Cai, or Zhang among the 18 co-founders? Round to one decimal place.","file_path":[]} |
| {"id":"query_002","question":"According to the manual, for DJI's heaviest AIR-series drone by takeoff weight, how many mAh of battery energy remain after flying half a marathon? (Note 1: assume calm air; minimum energy use is flying at 60% of max speed. Note 2: power draw can be converted from max flight time.)","file_path":["/path/to/file.pdf"]} |
| ``` |
|
|
| #### 3.2 JSONL example (using Skills) |
|
|
| ```json |
| {"id":"query_003","question":"Use pymatgen to build a simple TiO2 surface slab. Please generate a common low-index surface, report the Miller index, slab thickness, and vacuum size, and briefly describe the resulting surface structure.","skills":[{"name": "skill_name1", "description": "description1", "skill_path": "skill_path1"}, {"name": "skill_name2", "description": "description2", "skill_path": "skill_path2"}]} |
| ``` |
|
|
| ## Recommended workflow: copy a script, then run |
|
|
| ### A. Local / vLLM (`run_batch_inference_demo.sh`) |
| |
| ```bash |
| cp run_batch_inference_demo.sh run_batch_local.sh |
| mkdir -p run_logs |
| # Edit parameters inside run_batch_local.sh |
| bash run_batch_local.sh |
| ``` |
| |
| Notes: |
| |
| - The script starts Python with `nohup ... &` and prints the background PID. |
| - Tail logs: `tail -f run_logs/run.log` |
|
|
| ### B. Online platform (`run_batch_inference_online_demo.sh`) |
|
|
| ```bash |
| cp run_batch_inference_online_demo.sh run_batch_online.sh |
| mkdir -p run_logs |
| # Edit parameters inside run_batch_online.sh |
| bash run_batch_online.sh |
| ``` |
|
|
| Notes: |
|
|
| - Focus on: `LLM_CLIENT_URLS`, `LLM_CLIENT_MODELS`, `SYSTEM_FORMAT` |
| - Tail logs: `tail -f run_logs/run_batch_*.log` |
|
|
| ## Script parameters |
|
|
| ### Basic |
|
|
| - `LLM_CLIENT_URLS`: Model service URLs, space-separated (paired with the model list) |
| - `LLM_CLIENT_MODELS`: Model names, space-separated |
| - `TEST_DATA_FILE`: Input JSONL path |
| - `OUTPUT_FILE`: Output file when `ROLLOUT_NUM=1` |
| - `OUTPUT_DIR`: Output directory when `ROLLOUT_NUM>1` (e.g. `rollout_01.jsonl`, β¦) |
| - `ROLLOUT_NUM`: Number of rollouts per sample |
| - `RESUME_FROM_FILE`: Resume checkpoint file (may be empty) |
| - `AVAILABLE_TOOLS`: Enabled tools, space-separated |
| - `TASK_TYPE`: Whether to treat input as text-only; default `input_only` |
|
|
| ### Inference control |
|
|
| - `MAX_ROUNDS`: Max rounds per query |
| - `CONCURRENCY_WORKERS`: Number of concurrent workers |
| - `SAVE_BATCH_SIZE`: Flush results to disk every N samples |
| - `TEMPERATURE`: Sampling temperature |
| - `TOP_P`: Top-p (included in `run_batch_inference_demo.sh`) |
| - `EXTRA_PAYLOAD`: Extra model payload (JSON string; included in `run_batch_inference_demo.sh`) |
| - `TIMEOUT_FOR_ONE_QUERY`: Per-query timeout (seconds) |
| - `LLM_API_RETRY_TIMES`: Retries after LLM failure (not counting the first attempt) |
| - `SYSTEM_PROMPT`: Custom system prompt; empty uses the built-in default |
| - `SYSTEM_FORMAT`: Platform format (mainly in `run_batch_inference_online_demo.sh`) |
|
|
| ### Context truncation |
|
|
| - `DISCARD_ALL_MODE`: Enable discard-all (`true`/`false`) |
| - `MODEL_MAX_CONTEXT_TOKENS`: Model max context length |
| - `DISCARD_RATIO`: Threshold ratio to trigger discard |
| - `TOKENIZER_PATH`: Path to tokenizer used for token counting |
|
|
| ### Logging |
|
|
| - `LOG_LABEL`: Log label; directory shape `logs/YYYY_MM_DD_<LOG_LABEL>/` |
| - `LOG_FILE`: Script log file under `run_logs/*.log` |
| - `LOGGING_ROOT`: Log root (set in `run_batch_inference_demo.sh`; may be empty) |
|
|
| ## `SYSTEM_FORMAT` values |
| |
| `SYSTEM_FORMAT` selects platform-specific handling via keyword branches. |
|
|
| - `deep_research`: Local deep-research format (vLLM deployment) |
| - `azure`: Azure OpenAI |
| - `aihubmix`: AIHubMix (OpenAI-compatible) |
| - `aihubmix_claude`: AIHubMix Claude format |
| - `aihubmix_glm`: AIHubMix GLM format |
| - `volcano`: Volcano Engine |
| - `aliyun`: Alibaba Cloud Bailian format |
|
|
| ## Currently available tools (9) |
|
|
| - `wide_search`: General web search via Serp; multiple queries in one round |
| - `scholar_search`: Google Scholar academic search (+ web results) |
| - `image_search`: Image search; multiple queries supported |
| - `wide_visit`: Visit pages and summarize toward a `goal` |
| - `file_wide_parse`: Parse local/remote files (PDF, DOCX, MD, CSV, etc.) |
| - `execute_code`: Run Python code |
| - `ask_question_about_image`: Image understanding and Q&A |
| - `ask_question_about_video`: Video understanding and Q&A |
| - `bash`: Run shell commands |
|
|
| Tool schemas are defined in `DEEPRESEARCH_SYSTEM_PROMPT` in `utils/prompts.py`. |
|
|
| ## Outputs and logs |
|
|
| ### Output JSONL fields |
|
|
| Each line written by `run_batch_inference.py` contains: |
|
|
| - `time_stamp`: Write time for that row (`YYYY-MM-DD HH:MM:SS`). |
| - `query_id`: Batch-level query id (hash of `question`). |
| - `query`: This rowβs `question` text. |
| - `result`: Detailed result object for one segment (from `run_single_inference.py`). |
| - `status`: `success` / `timeout` / `error`. |
| - `discard_segments`: Segments truncated by discard-all and summarized (excluding the final segment). |
| - `elapsed_sec`: Total seconds for this rollout of the query. |
| - `rollout_idx`: Rollout index (1-based). |
| - `src`: Full original input line (often includes `id`, `question`, `file_path`, skills, etc.). |
| - `segment_idx`: Current segment index (1-based). |
| - `segment_total`: Total segments for this query; `0` if there is no valid `result`. |
|
|
| Common fields inside `result` (`run_single_inference.py`): |
|
|
| - `query_id`: Single-run instance id (includes a time suffix). |
| - `tools`: Enabled tool schemas (string form). |
| - `messages`: Messages for model reasoning and tool interaction. |
| - `final_answer`: Answer text for this segment. |
| - `transcript`: Fuller trajectory (including tool returns). |
| - `rounds`: Rounds executed in this segment. |
| - `stopped_reason`: Why it stopped (e.g. `no_tool_calls`, `discard_all_01`, `discard_all_final`, `max_rounds_exceeded`). |
| - `error`: Present only on failure. |
|
|
| ### Log directories |
|
|
| Default layout when `LOGGING_ROOT` is empty: |
|
|
| ```text |
| logs/ |
| βββ YYYY_MM_DD_<LOG_LABEL>/ |
| βββ collect.log |
| βββ <query_id>/ |
| βββ run.log |
| βββ result.json |
| ``` |
|
|
| ## Tool tests |
|
|
| Run the tool test script: |
|
|
| ```bash |
| python test_all_tools.py |
| ``` |
|
|
| This exercises all registered tools and checks that basic behavior works. |
|
|