khoantap's picture
Update README.md
d66db8f verified
# II-Thought-1.5B-Preview
<div style="display: flex; justify-content: center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/xBJE1uk9_FGPn2N1emMFR.png" width="800">
</div>
## Overview
**II-Thought-1.5B-Preview** is a Reinforcement Learning enhanced language model trained on **a subset of [II-Thought-RL-v0](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0)**, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled **50K math subset** ([dataset link](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0-Math-50K)).
## Training Methodology
- **Framework**: [ii_thought](https://github.com/Intelligent-Internet/ii-thought) / [verl](https://github.com/volcengine/verl)
- **Algorithm**: GRPO
- **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- **Reward Modeling**
- **Answer correctness reward**
<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/X15GjihIRO9hkfL361Pfd.png" width="300">
- **Format correctness reward**
<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/ib5bJu4lMkREigExRAUn9.png" width="300">
- **Final reward function**
<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/UXsKqJIFjCpT_vUUSTigr.png" width="300">
For a deeper look into the implementation details, refer to the our repository: [Intelligent-Internet/ii-thought](https://github.com/Intelligent-Internet/ii-thought/tree/main).
## Evaluation Results
We used the [EvalScope](https://github.com/modelscope/evalscope) to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:
- 64 responses: `AMC23, AIME24, AIME25`
- 4 responses: `Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English`
- 1 responses: `IFEval`
Sampling Configs:
- Max context length: 32,768
- Temperature: 0.6
- Top p: 0.95
- Top k: 40
- seed: 42
Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.
| Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview |
|-----------------------------------------|------------------------------|---------------------------|-------------------------|
| **AMC23** | 69.69 | 54.26 | **79.77** |
| **AIME24** | 29.43 | 10.73 | **34.17** |
| **AIME25** | 23.39 | 8.8 | **26.09** |
| **Olympiad Bench** | 43.15 | 36.07 | **52.78** |
| **Math500** | 83.6 | 73.15 | **87.2** |
| **Math Gaokao 2023 English** | 72.99 | 62.47 | **77.21** |
| **Minerva Math** | 27.57 | 24.45 | **30.79** |
| **Vietnamese Entrance Math Exam** | 40.32 | 26.69 | **46.24** |
| **LiveCodeBench** | 16.66 | 2.6 | **19.84** |
| **IFEval** | 44.24 | 27.22 | **44.84** |
| **Average** | 45.10 | 32.64 | **49.90** |
## How To Use
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):
```bash
vllm serve Intelligent-Internet/II-Thought-1.5B-Preview
```
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang):
```bash
python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview
```
### Usage Guidelines
- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
- For mathematical problems, explicitly request step-by-step reasoning and format the final answer within `\\boxed{}` (e.g., *"Please reason step by step, and put your final answer within \\boxed{}."*).
## Citation
```bib
@misc{2025iithought,
title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset},
author={Intelligent Internet},
year={2025}
}
```