| # II-Thought-1.5B-Preview | |
| <div style="display: flex; justify-content: center;"> | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/xBJE1uk9_FGPn2N1emMFR.png" width="800"> | |
| </div> | |
| ## Overview | |
| **II-Thought-1.5B-Preview** is a Reinforcement Learning enhanced language model trained on **a subset of [II-Thought-RL-v0](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0)**, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled **50K math subset** ([dataset link](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0-Math-50K)). | |
| ## Training Methodology | |
| - **Framework**: [ii_thought](https://github.com/Intelligent-Internet/ii-thought) / [verl](https://github.com/volcengine/verl) | |
| - **Algorithm**: GRPO | |
| - **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | |
| - **Reward Modeling** | |
| - **Answer correctness reward** | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/X15GjihIRO9hkfL361Pfd.png" width="300"> | |
| - **Format correctness reward** | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/ib5bJu4lMkREigExRAUn9.png" width="300"> | |
| - **Final reward function** | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/UXsKqJIFjCpT_vUUSTigr.png" width="300"> | |
| For a deeper look into the implementation details, refer to the our repository: [Intelligent-Internet/ii-thought](https://github.com/Intelligent-Internet/ii-thought/tree/main). | |
| ## Evaluation Results | |
| We used the [EvalScope](https://github.com/modelscope/evalscope) to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows: | |
| - 64 responses: `AMC23, AIME24, AIME25` | |
| - 4 responses: `Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English` | |
| - 1 responses: `IFEval` | |
| Sampling Configs: | |
| - Max context length: 32,768 | |
| - Temperature: 0.6 | |
| - Top p: 0.95 | |
| - Top k: 40 | |
| - seed: 42 | |
| Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs. | |
| | Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview | | |
| |-----------------------------------------|------------------------------|---------------------------|-------------------------| | |
| | **AMC23** | 69.69 | 54.26 | **79.77** | | |
| | **AIME24** | 29.43 | 10.73 | **34.17** | | |
| | **AIME25** | 23.39 | 8.8 | **26.09** | | |
| | **Olympiad Bench** | 43.15 | 36.07 | **52.78** | | |
| | **Math500** | 83.6 | 73.15 | **87.2** | | |
| | **Math Gaokao 2023 English** | 72.99 | 62.47 | **77.21** | | |
| | **Minerva Math** | 27.57 | 24.45 | **30.79** | | |
| | **Vietnamese Entrance Math Exam** | 40.32 | 26.69 | **46.24** | | |
| | **LiveCodeBench** | 16.66 | 2.6 | **19.84** | | |
| | **IFEval** | 44.24 | 27.22 | **44.84** | | |
| | **Average** | 45.10 | 32.64 | **49.90** | | |
| ## How To Use | |
| Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models. | |
| For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm): | |
| ```bash | |
| vllm serve Intelligent-Internet/II-Thought-1.5B-Preview | |
| ``` | |
| You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang): | |
| ```bash | |
| python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview | |
| ``` | |
| ### Usage Guidelines | |
| - Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95 | |
| - For mathematical problems, explicitly request step-by-step reasoning and format the final answer within `\\boxed{}` (e.g., *"Please reason step by step, and put your final answer within \\boxed{}."*). | |
| ## Citation | |
| ```bib | |
| @misc{2025iithought, | |
| title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset}, | |
| author={Intelligent Internet}, | |
| year={2025} | |
| } | |
| ``` |