# II-Thought-1.5B-Preview
- **Format correctness reward**
- **Final reward function**
For a deeper look into the implementation details, refer to the our repository: [Intelligent-Internet/ii-thought](https://github.com/Intelligent-Internet/ii-thought/tree/main).
## Evaluation Results
We used the [EvalScope](https://github.com/modelscope/evalscope) to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:
- 64 responses: `AMC23, AIME24, AIME25`
- 4 responses: `Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English`
- 1 responses: `IFEval`
Sampling Configs:
- Max context length: 32,768
- Temperature: 0.6
- Top p: 0.95
- Top k: 40
- seed: 42
Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.
| Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview |
|-----------------------------------------|------------------------------|---------------------------|-------------------------|
| **AMC23** | 69.69 | 54.26 | **79.77** |
| **AIME24** | 29.43 | 10.73 | **34.17** |
| **AIME25** | 23.39 | 8.8 | **26.09** |
| **Olympiad Bench** | 43.15 | 36.07 | **52.78** |
| **Math500** | 83.6 | 73.15 | **87.2** |
| **Math Gaokao 2023 English** | 72.99 | 62.47 | **77.21** |
| **Minerva Math** | 27.57 | 24.45 | **30.79** |
| **Vietnamese Entrance Math Exam** | 40.32 | 26.69 | **46.24** |
| **LiveCodeBench** | 16.66 | 2.6 | **19.84** |
| **IFEval** | 44.24 | 27.22 | **44.84** |
| **Average** | 45.10 | 32.64 | **49.90** |
## How To Use
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):
```bash
vllm serve Intelligent-Internet/II-Thought-1.5B-Preview
```
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang):
```bash
python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview
```
### Usage Guidelines
- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
- For mathematical problems, explicitly request step-by-step reasoning and format the final answer within `\\boxed{}` (e.g., *"Please reason step by step, and put your final answer within \\boxed{}."*).
## Citation
```bib
@misc{2025iithought,
title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset},
author={Intelligent Internet},
year={2025}
}
```