brightenbb commited on
Commit
4985746
·
verified ·
1 Parent(s): a859d6e

更正加粗显示和链接显示

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
3
  ---
4
  # Introduction
5
 
6
- **AntAngelMed is Officially Open Source! ** 🚀
7
 
8
  **AntAngelMed**, developed by **Ant Group** and the **Health Commission of Zhejiang Province**, is the largest and most powerful open-source medical language model to date.
9
 
@@ -11,17 +11,17 @@ license: apache-2.0
11
 
12
  + 🏆**World-leading performance on authoritative benchmarks**: AntAngelMed surpasses all open-source models and a range of top proprietary models on OpenAI's HealthBench, and ranks first overall on the Chinese authority benchmark MedAIBench.
13
  + 🧠**Advanced Medical Capabilities**: AntAngelMed achieves its professional medical capabilities through a rigorous three-stage training pipeline: continual pre-training on medical corpora, supervised fine-tuning with high-quality instructions, and GRPO-based reinforcement learning. This process equips the model with deep medical knowledge, sophisticated diagnostic reasoning, and robust adherence to safety and ethics.
14
- + ⚡**Extremely efficient inference:** Leveraging [Ling-flash-2.0](https://arxiv.org/abs/2507.17702)’s ([https://arxiv.org/abs/2507.17702](https://arxiv.org/abs/2507.17702)) high-efficiency MoE, AntAngelMed matches the performance of ~40B dense models while activating only 6.1B parameters of its 100B parameters. It achieves over 200 tokens/s on H20 hardware and supports 128K context length.
15
 
16
  # **📊** Benchmark Results
17
 
18
  ## **HealthBench**
19
 
20
- [**HealthBench**](https://arxiv.org/abs/2505.08775) ([https://arxiv.org/abs/2505.08775](https://arxiv.org/abs/2505.08775)) is an open-source medical evaluation benchmark released by OpenAI, designed to assess the performance of Large Language Models (LLMs) in real-world medical environments through highly simulated multi-turn dialogues. AntAngelMed achieved outstanding performance on this benchmark, ranking first among all open-source models, with a particularly significant advantage on the challenging HealthBench-Hard subset.
21
 
22
  ## **MedAIBench**
23
 
24
- [**MedAIBench**](https://www.medaibench.cn) ([https://www.medaibench.cn](https://www.medaibench.cn/)) is an authoritative medical LLM evaluation system developed by the National Artificial Intelligence Medical Industry Pilot Facility. AntAngelMed also **ranks first overall** and demonstrates strong comprehensive professionalism and safety, especially in medical knowledge Q&A and medical ethics/safety.
25
 
26
  ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/135556672/1765855632812-4659290c-8f89-4378-aa40-1df0fcbd6e78.png)
27
 
@@ -29,7 +29,7 @@ license: apache-2.0
29
 
30
  ## **MedBench**
31
 
32
- [**MedBench**](https://arxiv.org/abs/2511.14439) ([https://arxiv.org/abs/2511.14439)](https://arxiv.org/abs/2511.14439)) is a scientific and rigorous benchmark designed to evaluate LLMs in the Chinese healthcare domain. It comprises 36 independently curated evaluation datasets and covers approximately 700,000 samples. AntAngelMed ranks first on the MedBench self-assessment leaderboard and leads across five core dimensions: medical knowledge question answering, medical language understanding, medical language generation, complex medical reasoning, and safety and ethics, highlighting the model's professionalism, safety, and clinical applicability.
33
 
34
  ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/1591/1766130714462-1a4d7350-6255-4bd7-a01b-79fa6f9161ed.png)
35
 
@@ -44,7 +44,7 @@ AntAngelMed employs a carefully designed three-stage training process to deeply
44
 
45
  + **Continual Pre-Training:** Based on Ling-flash-2.0, AntAngelMed is continually pre-trained with large-scale, high-quality medical corpora (encyclopedias, web text, academic publications), injecting profound domain and world knowledge.
46
  + **Supervised Fine-Tuning (SFT):** A multi-source and heterogeneous high-quality instruction dataset is constructed at this stage. General data (math, programming, logic) strengthen core chain-of-thought capabilities of AngAngel, while medical scenarios (doctor–patient Q&A, diagnostic reasoning, safety/ethics) provide deep adaptation for improved clinical performance.
47
- + **Reinforcement Learning (RL):** Using the [**GRPO**](https://arxiv.org/pdf/2402.03300) ([https://arxiv.org/pdf/2402.03300](https://arxiv.org/pdf/2402.03300)) algorithm and task-specific reward models, RL precisely shapes model behavior—emphasizing empathy, structural clarity, and safety boundaries, and encouraging evidence-based reasoning on complex cases to reduce hallucinations and improve accuracy.
48
 
49
  ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/135556672/1765944098319-b6dc6933-3a6a-4d85-ae97-e9d98c6983c5.jpeg)
50
 
 
3
  ---
4
  # Introduction
5
 
6
+ **AntAngelMed is Officially Open Source!** 🚀
7
 
8
  **AntAngelMed**, developed by **Ant Group** and the **Health Commission of Zhejiang Province**, is the largest and most powerful open-source medical language model to date.
9
 
 
11
 
12
  + 🏆**World-leading performance on authoritative benchmarks**: AntAngelMed surpasses all open-source models and a range of top proprietary models on OpenAI's HealthBench, and ranks first overall on the Chinese authority benchmark MedAIBench.
13
  + 🧠**Advanced Medical Capabilities**: AntAngelMed achieves its professional medical capabilities through a rigorous three-stage training pipeline: continual pre-training on medical corpora, supervised fine-tuning with high-quality instructions, and GRPO-based reinforcement learning. This process equips the model with deep medical knowledge, sophisticated diagnostic reasoning, and robust adherence to safety and ethics.
14
+ + ⚡**Extremely efficient inference:** Leveraging [Ling-flash-2.0](https://arxiv.org/abs/2507.17702)’s high-efficiency MoE, AntAngelMed matches the performance of ~40B dense models while activating only 6.1B parameters of its 100B parameters. It achieves over 200 tokens/s on H20 hardware and supports 128K context length.
15
 
16
  # **📊** Benchmark Results
17
 
18
  ## **HealthBench**
19
 
20
+ [**HealthBench**](https://arxiv.org/abs/2505.08775) is an open-source medical evaluation benchmark released by OpenAI, designed to assess the performance of Large Language Models (LLMs) in real-world medical environments through highly simulated multi-turn dialogues. AntAngelMed achieved outstanding performance on this benchmark, ranking first among all open-source models, with a particularly significant advantage on the challenging HealthBench-Hard subset.
21
 
22
  ## **MedAIBench**
23
 
24
+ [**MedAIBench**](https://www.medaibench.cn) is an authoritative medical LLM evaluation system developed by the National Artificial Intelligence Medical Industry Pilot Facility. AntAngelMed also **ranks first overall** and demonstrates strong comprehensive professionalism and safety, especially in medical knowledge Q&A and medical ethics/safety.
25
 
26
  ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/135556672/1765855632812-4659290c-8f89-4378-aa40-1df0fcbd6e78.png)
27
 
 
29
 
30
  ## **MedBench**
31
 
32
+ [**MedBench**](https://arxiv.org/abs/2511.14439) is a scientific and rigorous benchmark designed to evaluate LLMs in the Chinese healthcare domain. It comprises 36 independently curated evaluation datasets and covers approximately 700,000 samples. AntAngelMed ranks first on the MedBench self-assessment leaderboard and leads across five core dimensions: medical knowledge question answering, medical language understanding, medical language generation, complex medical reasoning, and safety and ethics, highlighting the model's professionalism, safety, and clinical applicability.
33
 
34
  ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/1591/1766130714462-1a4d7350-6255-4bd7-a01b-79fa6f9161ed.png)
35
 
 
44
 
45
  + **Continual Pre-Training:** Based on Ling-flash-2.0, AntAngelMed is continually pre-trained with large-scale, high-quality medical corpora (encyclopedias, web text, academic publications), injecting profound domain and world knowledge.
46
  + **Supervised Fine-Tuning (SFT):** A multi-source and heterogeneous high-quality instruction dataset is constructed at this stage. General data (math, programming, logic) strengthen core chain-of-thought capabilities of AngAngel, while medical scenarios (doctor–patient Q&A, diagnostic reasoning, safety/ethics) provide deep adaptation for improved clinical performance.
47
+ + **Reinforcement Learning (RL):** Using the [**GRPO**](https://arxiv.org/pdf/2402.03300) algorithm and task-specific reward models, RL precisely shapes model behavior—emphasizing empathy, structural clarity, and safety boundaries, and encouraging evidence-based reasoning on complex cases to reduce hallucinations and improve accuracy.
48
 
49
  ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/135556672/1765944098319-b6dc6933-3a6a-4d85-ae97-e9d98c6983c5.jpeg)
50