Update docs.md
Browse files
docs.md
CHANGED
|
@@ -24,12 +24,13 @@
|
|
| 24 |
|
| 25 |
<h2>π Background</h2>
|
| 26 |
<p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
|
|
|
|
| 27 |
|
| 28 |
|
| 29 |
-
<div style="display: flex; align-items: center; justify-content: center; width: 100%; height:
|
| 30 |
<img
|
| 31 |
-
src="https://cdn-uploads.huggingface.co/production/uploads/
|
| 32 |
-
alt="
|
| 33 |
style="max-width: 80%; max-height: 100%; object-fit: contain;"
|
| 34 |
/>
|
| 35 |
</div>
|
|
@@ -44,7 +45,7 @@
|
|
| 44 |
</ul>
|
| 45 |
<p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
|
| 46 |
|
| 47 |
-
<div style="display: flex; align-items: center; justify-content: center; width: 100%; height:
|
| 48 |
<img
|
| 49 |
src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/xpyabfXWqacZD-ThQ5guU.jpeg"
|
| 50 |
alt="HMS"
|
|
@@ -87,6 +88,11 @@ Importantly, all 87 datasets have been verified to be either fully open-access o
|
|
| 87 |
</ul>
|
| 88 |
We will review and evaluate your submission and update the leaderboard accordingly.
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
<h2>π€ Contributing</h2>
|
| 92 |
<p>We welcome and greatly value contributions and collaborations from the community!
|
|
@@ -94,23 +100,27 @@ If you have clinical text datasets that you would like to share for broader expl
|
|
| 94 |
<p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
|
| 95 |
|
| 96 |
|
| 97 |
-
<h2>π’ Updates</h2>
|
| 98 |
-
<ul>
|
| 99 |
-
<li>ποΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
|
| 100 |
-
<li>ποΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
|
| 101 |
-
</ul>
|
| 102 |
|
| 103 |
-
<h2>π¬ Contact Information</h2>
|
| 104 |
-
<p>If you have any questions about BRIDGE or the leaderboard, feel free to reach out!</p>
|
| 105 |
-
<ul>
|
| 106 |
-
<li><strong>Leaderboard Managers</strong>: Jiageng Wu (jiwu7@bwh.harvard.edu), Kevin Xie (kevinxie@mit.edu)</li>
|
| 107 |
-
<li><strong>Benchmark Managers</strong>: Jiageng Wu (jiwu7@bwh.harvard.edu), Bowen Gu (bogu@bwh.harvard.edu)</li>
|
| 108 |
-
<li><strong>Program Lead</strong>: Jie Yang (jyang66@bwh.harvard.edu)</li>
|
| 109 |
-
</ul>
|
| 110 |
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
|
| 115 |
author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
|
| 116 |
year={2025},
|
|
@@ -129,4 +139,6 @@ If you have clinical text datasets that you would like to share for broader expl
|
|
| 129 |
year={2024},
|
| 130 |
publisher={Massachusetts Medical Society}
|
| 131 |
}
|
| 132 |
-
</code></pre>
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
<h2>π Background</h2>
|
| 26 |
<p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
|
| 27 |
+
This project is led and maintained by the team of <a href="https://ylab.top/">Prof. Jie Yang</a> and <a href="https://www.drugepi.org/team/joshua-kueiyu-lin">Prof. Kueiyu Joshua Lin</a> at Harvard Medical School and Brigham and Women's Hospital.
|
| 28 |
|
| 29 |
|
| 30 |
+
<div style="display: flex; align-items: center; justify-content: center; width: 100%; height: auto;">
|
| 31 |
<img
|
| 32 |
+
src="https://cdn-uploads.huggingface.co/production/uploads/633c70c4ccce04161f841c30/OLN3J8_Yq8dx_LrgjYSsC.png"
|
| 33 |
+
alt="dataset"
|
| 34 |
style="max-width: 80%; max-height: 100%; object-fit: contain;"
|
| 35 |
/>
|
| 36 |
</div>
|
|
|
|
| 45 |
</ul>
|
| 46 |
<p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
|
| 47 |
|
| 48 |
+
<div style="display: flex; align-items: center; justify-content: center; width: 100%; height: auto;">
|
| 49 |
<img
|
| 50 |
src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/xpyabfXWqacZD-ThQ5guU.jpeg"
|
| 51 |
alt="HMS"
|
|
|
|
| 88 |
</ul>
|
| 89 |
We will review and evaluate your submission and update the leaderboard accordingly.
|
| 90 |
|
| 91 |
+
<h2>π’ Updates</h2>
|
| 92 |
+
<ul>
|
| 93 |
+
<li>ποΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
|
| 94 |
+
<li>ποΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
|
| 95 |
+
</ul>
|
| 96 |
|
| 97 |
<h2>π€ Contributing</h2>
|
| 98 |
<p>We welcome and greatly value contributions and collaborations from the community!
|
|
|
|
| 100 |
<p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
|
| 101 |
|
| 102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
+
## π Donation
|
| 106 |
+
|
| 107 |
+
BRIDGE is a non-profit, researcher-led benchmark that requires substantial resources (e.g., high-performance GPUs, a dedicated team) to sustain. To support open and impactful academic research that advances clinical care, we welcome your contributions. Please contact Prof. Jie Yang at <jyang66@bwh.harvard.edu> to discuss donation opportunities.</p>
|
| 108 |
+
|
| 109 |
+
## π¬ Contact Information
|
| 110 |
+
|
| 111 |
+
If you have any questions about BRIDGE or the leaderboard, feel free to reach out!
|
| 112 |
+
- **Leaderboard Managers**: Jiageng Wu (<jiwu7@bwh.harvard.edu>), Kevin Xie (<kevinxie@mit.edu>), Bowen Gu (<bogu@bwh.harvard.edu>)
|
| 113 |
+
- **Benchmark Managers**: Jiageng Wu (<jiwu7@bwh.harvard.edu>), Bowen Gu (<bogu@bwh.harvard.edu>)
|
| 114 |
+
- **Project Lead**: Jie Yang (<jyang66@bwh.harvard.edu>)
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
## π Citation
|
| 120 |
+
|
| 121 |
+
If you find this leaderboard useful for your research and applications, please cite the following papers:
|
| 122 |
+
<pre style="white-space: pre-wrap; overflow-wrap: anywhere;">
|
| 123 |
+
<code>@article{BRIDGE-benchmark,
|
| 124 |
title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
|
| 125 |
author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
|
| 126 |
year={2025},
|
|
|
|
| 139 |
year={2024},
|
| 140 |
publisher={Massachusetts Medical Society}
|
| 141 |
}
|
| 142 |
+
</code></pre>
|
| 143 |
+
|
| 144 |
+
If you use the datasets in BRIDGE, please also cite the original paper of datasets, which can be found in the our BRIDGE paper.
|