Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,9 @@ For RL stage we setup training with:
|
|
| 41 |
## III. Evaluation Results
|
| 42 |
|
| 43 |
Our II-Medical-8B model achieved a 40% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date. We provide a comparison to models available in ChatGPT below.
|
| 44 |
-
|
|
|
|
|
|
|
| 45 |
|
| 46 |

|
| 47 |
|
|
|
|
| 41 |
## III. Evaluation Results
|
| 42 |
|
| 43 |
Our II-Medical-8B model achieved a 40% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date. We provide a comparison to models available in ChatGPT below.
|
| 44 |
+
|
| 45 |
+

|
| 46 |
+
Detailed result for HealthBench can be found [here](https://huggingface.co/datasets/Intelligent-Internet/OpenAI-HealthBench-II-Medical-8B-GPT-4.1).
|
| 47 |
|
| 48 |

|
| 49 |
|