S1-Base-1.5-8B-128K / README.md

ScienceOne-AI

Update README.md

d17817a verified about 1 month ago

preview code

raw

history blame contribute delete

4.48 kB

metadata

license: apache-2.0

S1-Base-1.5-8B-128K

中文版｜ English

Model Introduction

This repository contains the S1-Base-1.5-8B-128K general scientific large language model, developed through post-training (SFT+GRPO) based on the scientific foundation model S1-Base. This model maintains scientific reasoning capabilities while significantly enhancing long context understanding and reasoning abilities, as well as complex instruction following in scientific research scenarios. The model supports a context length of 128k.

Model Weights

The S1-Base-1.5-8B-128K model is open-sourced under the Apache 2.0 license. You can download the model weights from our Huggingface or ModelScope.

Model Name	Huggingface Link	ModelScope Link
S1-Base-8B	Download	Download

Model Evaluation

To comprehensively validate the capabilities of S1-Base-1.5-128K, we conducted systematic evaluations across three core competencies: long context ability, instruction following ability, and scientific reasoning ability. The results are shown in the table below.

Benchmark	S1-Base-1.5-8B-128K	S1-Base-8B	Qwen3-8B	Intern-S1-mini	GLM-Z1-9B-0414
CLongEval	36.18	27.51	33.62	32.82	25.71
InfiniteBench	35.57	27.62	34.41	30.42	29.58
IFEval	87.06	70.42	85.00	83.00	78.93
GPQA	70.33	63.01	60.86	65.97	55.81
ChemBench	61.59	62.74	57.79	57.54	55.85
LLM-MSE	83.63	88.50	83.51	78.65	80.97
LAB bench	37.54	37.63	26.52	29.11	29.89
AIME2024	77.92	75.42	74.60	85.00	79.37
LiveMathBench	86.72	82.81	77.00	86.72	82.82

Key Highlights:

📜 Enhanced Long Context Reasoning: The model leads among base models and similar-sized models on public long-context benchmarks such as CLongEval and InfiniteBench, with significant improvements in custom long-text evaluations for real-world scenarios involving papers and web pages.
🎯 Improved Complex Instruction Following: Built with a scientific literature instruction following task system covering four major categories—document understanding, structured generation, information extraction, and chart comprehension—combined with multi-dimensional constraints including length, format, and content. The model maintains leadership on benchmarks like IFEval.
🔬 Stable Scientific Reasoning Capability: The model shows significant advantages on GPQA, a comprehensive scientific capability evaluation benchmark covering biology, physics, and chemistry. Performance on other scientific task evaluation benchmarks remains stable without significant fluctuations due to context expansion.
👍 User Feedback Data Flywheel: Continuously optimizes model performance and user experience in real-world scenarios by incorporating user likes and dislikes feedback from the ScienceOne platform.

Deployment

We recommend using vLLM to deploy S1-Base for efficient inference and OpenAI-compatible API services.

Quick start command example:

pip install vllm  
vllm serve <your_s1_model_path> --served-model-name s1-base-1.5-8b-128k

The API request and response formats are basically consistent with OpenAI. Please refer to the official vLLM documentation for details.

Generate responses using OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="")
resp = client.chat.completions.create(
    model="s1-base-1.5-8b-128k",
    messages=[{"role": "user", "content": "hi"}]
)
print(resp.choices[0].message.content)

Generate responses using CURL:

curl -X POST http://localhost:8000/v1/chat/completions -d '{"model": "s1-base-1.5-8b-128k", "messages":[{"role":"user", "content": "hi"}], "skip_special_tokens": false}' -H "Content-Type: application/json"