Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,71 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# S1-Base-1.5-8B-128K
|
| 6 |
+
|
| 7 |
+
[中文版](./README_zh.md) | [English](./README.md)
|
| 8 |
+
|
| 9 |
+
## Model Introduction
|
| 10 |
+
|
| 11 |
+
This repository contains the S1-Base-1.5-8B-128K general scientific large language model, developed through post-training (SFT+GRPO) based on the scientific foundation model [S1-Base](https://huggingface.co/collections/ScienceOne-AI/s1-base). This model maintains scientific reasoning capabilities while significantly enhancing long context understanding and reasoning abilities, as well as complex instruction following in scientific research scenarios. The model supports a context length of 128k.
|
| 12 |
+
|
| 13 |
+
## Model Weights
|
| 14 |
+
|
| 15 |
+
The S1-Base-1.5-8B-128K model is open-sourced under the Apache 2.0 license. You can download the model weights from our [Huggingface](https://huggingface.co/ScienceOne-AI/S1-Base-1.5-8B-128K) or [ModelScope](https://modelscope.cn/models/ScienceOne-AI/S1-Base-1.5-8B-128K).
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
| Model Name | Huggingface Link | ModelScope Link |
|
| 19 |
+
|-------------|-------------------------------------|-------------------------------------|
|
| 20 |
+
|S1-Base-8B | [Download](https://huggingface.co/ScienceOne-AI/S1-Base-1.5-8B-128K) | [Download](https://modelscope.cn/models/ScienceOne-AI/S1-Base-1.5-8B-128K) |
|
| 21 |
+
|
| 22 |
+
## Model Evaluation
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
To comprehensively validate the capabilities of S1-Base-1.5-128K, we conducted systematic evaluations across three core competencies: long context ability, instruction following ability, and scientific reasoning ability. The results are shown in the table below.
|
| 26 |
+
|
| 27 |
+
| Benchmark | S1-Base-1.5-8B-128K | S1-Base-8B | Qwen3-8B | Intern-S1-mini | GLM-Z1-9B-0414 |
|
| 28 |
+
|---|---|---|---|---|---|
|
| 29 |
+
| CLongEval | **36.18** | 27.51 | 33.62 | 32.82 | 25.71 |
|
| 30 |
+
| InfiniteBench | **35.57** | 27.62 | 34.41 | 30.42 | 29.58 |
|
| 31 |
+
| IFEval | **87.06** | 70.42 | 85.00 | 83.00 | 78.93 |
|
| 32 |
+
| GPQA | **70.33** | 63.01 | 60.86 | 65.97 | 55.81 |
|
| 33 |
+
| ChemBench | 61.59 | **62.74** | 57.79 | 57.54 | 55.85 |
|
| 34 |
+
| LLM-MSE | 83.63 | **88.50** | 83.51 | 78.65 | 80.97 |
|
| 35 |
+
| LAB bench | 37.54 | **37.63** | 26.52 | 29.11 | 29.89 |
|
| 36 |
+
| AIME2024 | 77.92 | 75.42 | 74.60 | **85.00** | 79.37 |
|
| 37 |
+
| LiveMathBench | **86.72** | 82.81 | 77.00 | **86.72** | 82.82 |
|
| 38 |
+
|
| 39 |
+
**Key Highlights:**
|
| 40 |
+
- 📜 **Enhanced Long Context Reasoning**: The model leads among base models and similar-sized models on public long-context benchmarks such as CLongEval and InfiniteBench, with significant improvements in custom long-text evaluations for real-world scenarios involving papers and web pages.
|
| 41 |
+
- 🎯 **Improved Complex Instruction Following**: Built with a scientific literature instruction following task system covering four major categories—document understanding, structured generation, information extraction, and chart comprehension—combined with multi-dimensional constraints including length, format, and content. The model maintains leadership on benchmarks like IFEval.
|
| 42 |
+
- 🔬 **Stable Scientific Reasoning Capability**: The model shows significant advantages on GPQA, a comprehensive scientific capability evaluation benchmark covering biology, physics, and chemistry. Performance on other scientific task evaluation benchmarks remains stable without significant fluctuations due to context expansion.
|
| 43 |
+
- 👍 **User Feedback Data Flywheel**: Continuously optimizes model performance and user experience in real-world scenarios by incorporating user likes and dislikes feedback from the [ScienceOne](https://scienceone.cn) platform.
|
| 44 |
+
|
| 45 |
+
## Deployment
|
| 46 |
+
|
| 47 |
+
We recommend using [vLLM](https://github.com/vllm-project/vllm) to deploy S1-Base for efficient inference and OpenAI-compatible API services.
|
| 48 |
+
|
| 49 |
+
**Quick start command example:**
|
| 50 |
+
```bash
|
| 51 |
+
pip install vllm
|
| 52 |
+
vllm serve <your_s1_model_path> --served-model-name s1-base-1.5-8b-128k
|
| 53 |
+
```
|
| 54 |
+
The API request and response formats are basically consistent with OpenAI. Please refer to the official vLLM documentation for details.
|
| 55 |
+
|
| 56 |
+
**Generate responses using OpenAI Python SDK:**
|
| 57 |
+
```python
|
| 58 |
+
from openai import OpenAI
|
| 59 |
+
|
| 60 |
+
client = OpenAI(base_url="http://localhost:8000/v1", api_key="")
|
| 61 |
+
resp = client.chat.completions.create(
|
| 62 |
+
model="s1-base-1.5-8b-128k",
|
| 63 |
+
messages=[{"role": "user", "content": "hi"}]
|
| 64 |
+
)
|
| 65 |
+
print(resp.choices[0].message.content)
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
**Generate responses using CURL:**
|
| 69 |
+
```bash
|
| 70 |
+
curl -X POST http://localhost:8000/v1/chat/completions -d '{"model": "s1-base-1.5-8b-128k", "messages":[{"role":"user", "content": "hi"}], "skip_special_tokens": false}' -H "Content-Type: application/json"
|
| 71 |
+
```
|