Update README.md

d66db8f verified 10 months ago

5.05 kB

	# II-Thought-1.5B-Preview

	<div style="display: flex; justify-content: center;">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/xBJE1uk9_FGPn2N1emMFR.png" width="800">
	</div>

	## Overview

	II-Thought-1.5B-Preview is a Reinforcement Learning enhanced language model trained on a subset of [II-Thought-RL-v0](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0), the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled 50K math subset ([dataset link](https://huggingface.co/datasets/Intelligent-Internet/II-Thought-RL-v0-Math-50K)).

	## Training Methodology

	- Framework: [ii_thought](https://github.com/Intelligent-Internet/ii-thought) / [verl](https://github.com/volcengine/verl)
	- Algorithm: GRPO
	- Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	- Reward Modeling
	- Answer correctness reward
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/X15GjihIRO9hkfL361Pfd.png" width="300">
	- Format correctness reward
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/ib5bJu4lMkREigExRAUn9.png" width="300">
	- Final reward function
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67c563afa34e1ad5a3533ccf/UXsKqJIFjCpT_vUUSTigr.png" width="300">

	For a deeper look into the implementation details, refer to the our repository: [Intelligent-Internet/ii-thought](https://github.com/Intelligent-Internet/ii-thought/tree/main).

	## Evaluation Results

	We used the [EvalScope](https://github.com/modelscope/evalscope) to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:
	- 64 responses: `AMC23, AIME24, AIME25`
	- 4 responses: `Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English`
	- 1 responses: `IFEval`

	Sampling Configs:
	- Max context length: 32,768
	- Temperature: 0.6
	- Top p: 0.95
	- Top k: 40
	- seed: 42

	Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.

	\| Benchmark \| DeepSeek-R1-Distill-Qwen-1.5B \| Qwen2.5-Math-1.5B-Instruct \| II-Thought-1.5B-Preview \|
	\|-----------------------------------------\|------------------------------\|---------------------------\|-------------------------\|
	\| AMC23 \| 69.69 \| 54.26 \| 79.77 \|
	\| AIME24 \| 29.43 \| 10.73 \| 34.17 \|
	\| AIME25 \| 23.39 \| 8.8 \| 26.09 \|
	\| Olympiad Bench \| 43.15 \| 36.07 \| 52.78 \|
	\| Math500 \| 83.6 \| 73.15 \| 87.2 \|
	\| Math Gaokao 2023 English \| 72.99 \| 62.47 \| 77.21 \|
	\| Minerva Math \| 27.57 \| 24.45 \| 30.79 \|
	\| Vietnamese Entrance Math Exam \| 40.32 \| 26.69 \| 46.24 \|
	\| LiveCodeBench \| 16.66 \| 2.6 \| 19.84 \|
	\| IFEval \| 44.24 \| 27.22 \| 44.84 \|
	\| Average \| 45.10 \| 32.64 \| 49.90 \|

	## How To Use
	Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.

	For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):

	```bash
	vllm serve Intelligent-Internet/II-Thought-1.5B-Preview
	```

	You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang):

	```bash
	python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview
	```

	### Usage Guidelines
	- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
	- For mathematical problems, explicitly request step-by-step reasoning and format the final answer within `\\boxed{}` (e.g., "Please reason step by step, and put your final answer within \\boxed{}.").


	## Citation

	```bib
	@misc{2025iithought,
	title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset},
	author={Intelligent Internet},
	year={2025}
	}
	```