Text Generation
Transformers
Safetensors
qwen3
conversational
text-generation-inference
pvduy commited on
Commit
0cabb6d
·
verified ·
1 Parent(s): 51dd958

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -11,12 +11,7 @@ tags: []
11
 
12
  ## I. Model Overview
13
 
14
- II-Medical-8B is a medical reasoning model trained on a [comprehensive dataset](https://huggingface.co/datasets/Intelligent-Internet/II-Medical-Reasoning-SFT-V0) of medical knowledge. The model is designed to enhance AI capabilities in medical.
15
-
16
- ![Model Benchmark](https://cdn-uploads.huggingface.co/production/uploads/6389496ff7d3b0df092095ed/uvporIhY4_WN5cGaGF1Cm.png)
17
-
18
- Our II-Medical-8B model achieved a 40% score on HealthBench, an open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date.
19
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6389496ff7d3b0df092095ed/S90HEqD6UJCme-1_17IJw.png)
20
 
21
  ## II. Training Methodology
22
 
@@ -45,6 +40,12 @@ For RL stage we setup training with:
45
 
46
  ## III. Evaluation Results
47
 
 
 
 
 
 
 
48
  We evaluate on ten medical QA benchmarks include MedMCQA, MedQA, PubMedQA, medical related questions from MMLU-Pro and GPQA, small QA sets from Lancet and the New England
49
  Journal of Medicine, 4 Options and 5 Options splits from the MedBullets platform and MedXpertQA.
50
 
 
11
 
12
  ## I. Model Overview
13
 
14
+ II-Medical-8B is the newest advanced large language model developed by Intelligent Internet, specifically engineered to enhance AI-driven medical reasoning. Following the positive reception of our previous [II-Medical-7B-Preview](https://huggingface.co/Intelligent-Internet/II-Medical-7B-Preview), this new iteration significantly advances the capabilities of medical question answering,
 
 
 
 
 
15
 
16
  ## II. Training Methodology
17
 
 
40
 
41
  ## III. Evaluation Results
42
 
43
+ ![Model Benchmark](https://cdn-uploads.huggingface.co/production/uploads/6389496ff7d3b0df092095ed/uvporIhY4_WN5cGaGF1Cm.png)
44
+
45
+ Our II-Medical-8B model achieved a 40% score on HealthBench, an open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date.
46
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6389496ff7d3b0df092095ed/S90HEqD6UJCme-1_17IJw.png)
47
+
48
+
49
  We evaluate on ten medical QA benchmarks include MedMCQA, MedQA, PubMedQA, medical related questions from MMLU-Pro and GPQA, small QA sets from Lancet and the New England
50
  Journal of Medicine, 4 Options and 5 Options splits from the MedBullets platform and MedXpertQA.
51