File size: 4,781 Bytes
8d6bf37
 
 
4656602
32d834a
4656602
88a4aeb
 
4656602
32d834a
4656602
32d834a
4656602
32d834a
4656602
32d834a
4656602
32d834a
 
 
 
 
4656602
32d834a
4656602
 
32d834a
4656602
32d834a
4656602
32d834a
 
4656602
32d834a
 
 
 
 
 
 
 
 
 
4656602
32d834a
4656602
 
32d834a
4656602
32d834a
 
 
4656602
 
32d834a
4656602
32d834a
 
4656602
32d834a
4656602
32d834a
 
 
4656602
32d834a
4656602
32d834a
 
 
4656602
32d834a
4656602
32d834a
 
4656602
32d834a
4656602
32d834a
 
 
4656602
 
32d834a
4656602
32d834a
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0
---

# II-Medical-32B-Preview


![image/png](https://cdn-uploads.huggingface.co/production/uploads/63466107f7bd6326925fc770/6R3uJGH1MKGSZt9F88Gvc.png)

## I. Model Overview

II-Medical-32B-Preview is the latest advanced large language model developed by Intelligent Internet, specifically designed to enhance AI-driven medical reasoning. As our first 32B-scale model version, it significantly advances the capabilities of medical question answering.

## II. Training Methodology

We collected and generated a comprehensive set of reasoning datasets for the medical domain and performed SFT fine-tuning on the [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) model. 

For the hyperparameter: 
- Max Length: 16378.
- Batch Size: 128.
- Learning-Rate: 2e-5.
- Number Of Epoch: 4.

## III. Evaluation Results


![image/png](https://cdn-uploads.huggingface.co/production/uploads/63466107f7bd6326925fc770/nfyIuAiaBLKZ1cesLN1te.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63466107f7bd6326925fc770/4S65RIgYgOk7GjtsRs0vM.png)

We evaluated on 10 medical QA benchmarks including MedMCQA, MedQA, PubMedQA, HealthBench, medical related questions from MMLU-Pro, small QA sets from Lancet and the New England
Journal of Medicine,  4 Options  and 5 Options splits from the MedBullets platform and MedXpertQA.

| Model                   | MedMC | MedQA | PubMed | MMLU-P | HealthBench | Lancet | MedB-4 | MedB-5 | MedX  | NEJM  | Avg   |
|--------------------------|-------|-------|--------|--------|------|--------|--------|--------|------|-------|-------|
| [HuatuoGPT-o1-72B](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-72B)         | 76.76 | 88.85 | 79.90   | 80.46  | 22.73 | 70.87   | 77.27  | 73.05  |23.53 |76.29  | 66.97 |
| [M1](https://huggingface.co/UCSC-VLAA/m1-7B-23K)                     | 62.54 | 75.81 | 75.80  | 65.86  | 15.51 | 62.62  | 63.64  | 59.74  |19.59 |64.34  | 56.55  |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)                  | 66.53 | 81.38 | 73.9   | 77.85  | 42.27 | 66.26   | 68.83  | 62.66  |19.59 |69.65  | 62.89 |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B)                  | 74.18 | 88.92 | 76.1   | 80.7  | 47.08 | 72.33   | 72.27  | 71.42  |28.04 |76.94  | 68.80 |
| [MedGemma-27B-IT](https://huggingface.co/google/medgemma-27b-text-it)                  | 73.24 | 87.27 | 70.9   | 80.13  | 46.54| 70.14   | 75.32  | 73.37  |25.55 |76.28  | 67.87 |
| [II-Medical-8B](https://huggingface.co/Intelligent-Internet/II-Medical-8B)        | 71.57 | 87.90 | 78.7   |80.46  | 40.02| 70.38  | 78.25  | 72.07  |25.26 |73.13  |67.77  |
| [II-Medical-8B-1706](https://huggingface.co/Intelligent-Internet/II-Medical-8B-1706)            | 74.44 | 88.61 | 79.8   | 81.04  | 46.8 | 71.60  | 80.84  | 74.67  |29.63 |77.61  | 70.47  |
| [II-Medical-32B-Preview](https://huggingface.co/Intelligent-Internet/II-Medical-32B-Preview)            | 75.16 | 90.02 | 79.1   | 80.71  | 47.24 | 75.48  | 81.16  | 74.68  |31.42 | 80.43  | **71.54**  |

## IV. Dataset Release


More importantly, besides the II-Medical-32B-Preview, we also release the training datasets of our SFT/Preview II-Medical and also our RL dataset.

- [II-Medical-Reasoning-SFT](https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT)
- [II-Medical-RL-MedReason](https://huggingface.co/datasets/Intelligent-Internet/II-Medical-RL)
- [II-Medical-RL-ChatDoctor](https://huggingface.co/datasets/Intelligent-Internet/ChatDoctor-RL)


We believe this work will be valuable resource for the community and contributes to the advancement of medical reasoning capabilities in AI systems.

## V. How To Use
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.

For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):

```bash
vllm serve Intelligent-Internet/II-Medical-32B-Preview
```

You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang):

```bash
python -m sglang.launch_server --model Intelligent-Internet/II-Medical-32B-Preview
```

## VI. Usage Guidelines

- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.9
- When using, explicitly request step-by-step reasoning and format the final answer within \boxed{} (e.g., "Please reason step-by-step, and put your final answer within \boxed{}.").

## VII. Limitations and Considerations

- Dataset may contain inherent biases from source materials
- Medical knowledge requires regular updates
- Please note that **It’s not suitable for medical use.**


## VIII. Citation

```bib
@misc{2025II-Medical-32B-Preview,
      title={II-Medical-32B-Preview: Medical Reasoning Model}, 
      author={Intelligent Internet},
      year={2025}
}
```