S1-Base-1.5-8B-128K / README.md
ScienceOne-AI's picture
Update README.md
d17817a verified
metadata
license: apache-2.0

S1-Base-1.5-8B-128K

中文版English

Model Introduction

This repository contains the S1-Base-1.5-8B-128K general scientific large language model, developed through post-training (SFT+GRPO) based on the scientific foundation model S1-Base. This model maintains scientific reasoning capabilities while significantly enhancing long context understanding and reasoning abilities, as well as complex instruction following in scientific research scenarios. The model supports a context length of 128k.

Model Weights

The S1-Base-1.5-8B-128K model is open-sourced under the Apache 2.0 license. You can download the model weights from our Huggingface or ModelScope.

Model Name Huggingface Link ModelScope Link
S1-Base-8B Download Download

Model Evaluation

To comprehensively validate the capabilities of S1-Base-1.5-128K, we conducted systematic evaluations across three core competencies: long context ability, instruction following ability, and scientific reasoning ability. The results are shown in the table below.

Benchmark S1-Base-1.5-8B-128K S1-Base-8B Qwen3-8B Intern-S1-mini GLM-Z1-9B-0414
CLongEval 36.18 27.51 33.62 32.82 25.71
InfiniteBench 35.57 27.62 34.41 30.42 29.58
IFEval 87.06 70.42 85.00 83.00 78.93
GPQA 70.33 63.01 60.86 65.97 55.81
ChemBench 61.59 62.74 57.79 57.54 55.85
LLM-MSE 83.63 88.50 83.51 78.65 80.97
LAB bench 37.54 37.63 26.52 29.11 29.89
AIME2024 77.92 75.42 74.60 85.00 79.37
LiveMathBench 86.72 82.81 77.00 86.72 82.82

Key Highlights:

  • 📜 Enhanced Long Context Reasoning: The model leads among base models and similar-sized models on public long-context benchmarks such as CLongEval and InfiniteBench, with significant improvements in custom long-text evaluations for real-world scenarios involving papers and web pages.
  • 🎯 Improved Complex Instruction Following: Built with a scientific literature instruction following task system covering four major categories—document understanding, structured generation, information extraction, and chart comprehension—combined with multi-dimensional constraints including length, format, and content. The model maintains leadership on benchmarks like IFEval.
  • 🔬 Stable Scientific Reasoning Capability: The model shows significant advantages on GPQA, a comprehensive scientific capability evaluation benchmark covering biology, physics, and chemistry. Performance on other scientific task evaluation benchmarks remains stable without significant fluctuations due to context expansion.
  • 👍 User Feedback Data Flywheel: Continuously optimizes model performance and user experience in real-world scenarios by incorporating user likes and dislikes feedback from the ScienceOne platform.

Deployment

We recommend using vLLM to deploy S1-Base for efficient inference and OpenAI-compatible API services.

Quick start command example:

pip install vllm  
vllm serve <your_s1_model_path> --served-model-name s1-base-1.5-8b-128k

The API request and response formats are basically consistent with OpenAI. Please refer to the official vLLM documentation for details.

Generate responses using OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="")
resp = client.chat.completions.create(
    model="s1-base-1.5-8b-128k",
    messages=[{"role": "user", "content": "hi"}]
)
print(resp.choices[0].message.content)

Generate responses using CURL:

curl -X POST http://localhost:8000/v1/chat/completions -d '{"model": "s1-base-1.5-8b-128k", "messages":[{"role":"user", "content": "hi"}], "skip_special_tokens": false}' -H "Content-Type: application/json"