# II-Thought-1.5B-Preview

## Overview **II-Thought-1.5B-Preview** is a Reinforcement Learning enhanced language model trained on **a subset of [II-Thought-RL-v0](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0)**, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled **50K math subset** ([dataset link](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0-Math-50K)). ## Training Methodology - **Framework**: [ii_thought](https://github.com/Intelligent-Internet/ii-thought) / [verl](https://github.com/volcengine/verl) - **Algorithm**: GRPO - **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B - **Reward Modeling** - **Answer correctness reward**

- **Format correctness reward**

- **Final reward function**

For a deeper look into the implementation details, refer to the our repository: [Intelligent-Internet/ii-thought](https://github.com/Intelligent-Internet/ii-thought/tree/main). ## Evaluation Results We used the [EvalScope](https://github.com/modelscope/evalscope) to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows: - 64 responses: `AMC23, AIME24, AIME25` - 4 responses: `Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English` - 1 responses: `IFEval` Sampling Configs: - Max context length: 32,768 - Temperature: 0.6 - Top p: 0.95 - Top k: 40 - seed: 42 Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs. | Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview | |-----------------------------------------|------------------------------|---------------------------|-------------------------| | **AMC23** | 69.69 | 54.26 | **79.77** | | **AIME24** | 29.43 | 10.73 | **34.17** | | **AIME25** | 23.39 | 8.8 | **26.09** | | **Olympiad Bench** | 43.15 | 36.07 | **52.78** | | **Math500** | 83.6 | 73.15 | **87.2** | | **Math Gaokao 2023 English** | 72.99 | 62.47 | **77.21** | | **Minerva Math** | 27.57 | 24.45 | **30.79** | | **Vietnamese Entrance Math Exam** | 40.32 | 26.69 | **46.24** | | **LiveCodeBench** | 16.66 | 2.6 | **19.84** | | **IFEval** | 44.24 | 27.22 | **44.84** | | **Average** | 45.10 | 32.64 | **49.90** | ## How To Use Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models. For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm): ```bash vllm serve Intelligent-Internet/II-Thought-1.5B-Preview ``` You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang): ```bash python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview ``` ### Usage Guidelines - Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95 - For mathematical problems, explicitly request step-by-step reasoning and format the final answer within `\\boxed{}` (e.g., *"Please reason step by step, and put your final answer within \\boxed{}."*). ## Citation ```bib @misc{2025iithought, title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset}, author={Intelligent Internet}, year={2025} } ```