File size: 5,049 Bytes

896acb1
 
3fe94a8
 
 
 
896acb1
 
 
 
 
 
 
54cc0b9
 
896acb1
 
bf35015
896acb1
bf35015
896acb1
bf35015
896acb1
 
 
 
 
 
4f0d554
d66db8f
896acb1
 
 
679adb4
896acb1
 
 
 
 
a7a2a0b
 
53d23f9
 
 
 
 
 
 
 
 
 
 
 
 
896acb1
 
 
 
 
 
 
 
 
 
2ca0143
896acb1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
612c8e0
 
896acb1

# II-Thought-1.5B-Preview  

<div style="display: flex; justify-content: center;">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/xBJE1uk9_FGPn2N1emMFR.png" width="800">
</div>

## Overview  

**II-Thought-1.5B-Preview** is a Reinforcement Learning enhanced language model trained on **a subset of [II-Thought-RL-v0](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0)**, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled **50K math subset** ([dataset link](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0-Math-50K)).

## Training Methodology  

- **Framework**: [ii_thought](https://github.com/Intelligent-Internet/ii-thought) / [verl](https://github.com/volcengine/verl)  
- **Algorithm**: GRPO
- **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- **Reward Modeling**
  - **Answer correctness reward**
  <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/X15GjihIRO9hkfL361Pfd.png" width="300">
  - **Format correctness reward**
  <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/ib5bJu4lMkREigExRAUn9.png" width="300">
  - **Final reward function**
  <img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/UXsKqJIFjCpT_vUUSTigr.png" width="300">

For a deeper look into the implementation details, refer to the our repository: [Intelligent-Internet/ii-thought](https://github.com/Intelligent-Internet/ii-thought/tree/main).

## Evaluation Results

We used the [EvalScope](https://github.com/modelscope/evalscope) to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:
  - 64 responses: `AMC23, AIME24, AIME25`
  - 4 responses: `Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English`
  - 1 responses: `IFEval`

Sampling Configs:
  - Max context length: 32,768
  - Temperature: 0.6
  - Top p: 0.95
  - Top k: 40
  - seed: 42

Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.

| Benchmark                               | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview |
|-----------------------------------------|------------------------------|---------------------------|-------------------------|
| **AMC23**                               | 69.69                        | 54.26                     | **79.77**                   |
| **AIME24**                              | 29.43                        | 10.73                     | **34.17**                   |
| **AIME25**                              | 23.39                        | 8.8                       | **26.09**                   |
| **Olympiad Bench**                      | 43.15                        | 36.07                     | **52.78**                   |
| **Math500**                             | 83.6                         | 73.15                     | **87.2**                    |
| **Math Gaokao 2023 English**            | 72.99                        | 62.47                     | **77.21**                   |
| **Minerva Math**                        | 27.57                        | 24.45                     | **30.79**                   |
| **Vietnamese Entrance Math Exam**       | 40.32                        | 26.69                     | **46.24**                   |
| **LiveCodeBench**                       | 16.66                        | 2.6                       | **19.84**                  |
| **IFEval**                              | 44.24                        | 27.22                     | **44.84**                  |
| **Average**                             | 45.10                        | 32.64                     | **49.90**                   |

## How To Use
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.

For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):

```bash
vllm serve Intelligent-Internet/II-Thought-1.5B-Preview
```

You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang):

```bash
python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview
```

### Usage Guidelines  
- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
- For mathematical problems, explicitly request step-by-step reasoning and format the final answer within `\\boxed{}` (e.g., *"Please reason step by step, and put your final answer within \\boxed{}."*).  


## Citation

```bib
@misc{2025iithought,
      title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset}, 
      author={Intelligent Internet},
      year={2025}
}
```