S1-Base-1.5-8B-128K
Model Introduction
This repository contains the S1-Base-1.5-8B-128K general scientific large language model, developed through post-training (SFT+GRPO) based on the scientific foundation model S1-Base. This model maintains scientific reasoning capabilities while significantly enhancing long context understanding and reasoning abilities, as well as complex instruction following in scientific research scenarios. The model supports a context length of 128k.
Model Weights
The S1-Base-1.5-8B-128K model is open-sourced under the Apache 2.0 license. You can download the model weights from our Huggingface or ModelScope.
Model Evaluation
To comprehensively validate the capabilities of S1-Base-1.5-128K, we conducted systematic evaluations across three core competencies: long context ability, instruction following ability, and scientific reasoning ability. The results are shown in the table below.
| Benchmark | S1-Base-1.5-8B-128K | S1-Base-8B | Qwen3-8B | Intern-S1-mini | GLM-Z1-9B-0414 |
|---|---|---|---|---|---|
| CLongEval | 36.18 | 27.51 | 33.62 | 32.82 | 25.71 |
| InfiniteBench | 35.57 | 27.62 | 34.41 | 30.42 | 29.58 |
| IFEval | 87.06 | 70.42 | 85.00 | 83.00 | 78.93 |
| GPQA | 70.33 | 63.01 | 60.86 | 65.97 | 55.81 |
| ChemBench | 61.59 | 62.74 | 57.79 | 57.54 | 55.85 |
| LLM-MSE | 83.63 | 88.50 | 83.51 | 78.65 | 80.97 |
| LAB bench | 37.54 | 37.63 | 26.52 | 29.11 | 29.89 |
| AIME2024 | 77.92 | 75.42 | 74.60 | 85.00 | 79.37 |
| LiveMathBench | 86.72 | 82.81 | 77.00 | 86.72 | 82.82 |
Key Highlights:
- 📜 Enhanced Long Context Reasoning: The model leads among base models and similar-sized models on public long-context benchmarks such as CLongEval and InfiniteBench, with significant improvements in custom long-text evaluations for real-world scenarios involving papers and web pages.
- 🎯 Improved Complex Instruction Following: Built with a scientific literature instruction following task system covering four major categories—document understanding, structured generation, information extraction, and chart comprehension—combined with multi-dimensional constraints including length, format, and content. The model maintains leadership on benchmarks like IFEval.
- 🔬 Stable Scientific Reasoning Capability: The model shows significant advantages on GPQA, a comprehensive scientific capability evaluation benchmark covering biology, physics, and chemistry. Performance on other scientific task evaluation benchmarks remains stable without significant fluctuations due to context expansion.
- 👍 User Feedback Data Flywheel: Continuously optimizes model performance and user experience in real-world scenarios by incorporating user likes and dislikes feedback from the ScienceOne platform.
Deployment
We recommend using vLLM to deploy S1-Base for efficient inference and OpenAI-compatible API services.
Quick start command example:
pip install vllm
vllm serve <your_s1_model_path> --served-model-name s1-base-1.5-8b-128k
The API request and response formats are basically consistent with OpenAI. Please refer to the official vLLM documentation for details.
Generate responses using OpenAI Python SDK:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="")
resp = client.chat.completions.create(
model="s1-base-1.5-8b-128k",
messages=[{"role": "user", "content": "hi"}]
)
print(resp.choices[0].message.content)
Generate responses using CURL:
curl -X POST http://localhost:8000/v1/chat/completions -d '{"model": "s1-base-1.5-8b-128k", "messages":[{"role":"user", "content": "hi"}], "skip_special_tokens": false}' -H "Content-Type: application/json"
- Downloads last month
- 22