Upload ClinicalBERT-Pro best checkpoint (epoch_400) with benchmark results
Browse files- README.md +102 -0
- config.json +4 -0
- figures/fig1.png +0 -0
- figures/fig2.png +0 -0
- figures/fig3.png +0 -0
- pytorch_model.bin +3 -0
README.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
---
|
| 5 |
+
# ClinicalBERT-Pro
|
| 6 |
+
<!-- markdownlint-disable first-line-h1 -->
|
| 7 |
+
<!-- markdownlint-disable html -->
|
| 8 |
+
<!-- markdownlint-disable no-duplicate-header -->
|
| 9 |
+
|
| 10 |
+
<div align="center">
|
| 11 |
+
<img src="figures/fig1.png" width="60%" alt="ClinicalBERT-Pro" />
|
| 12 |
+
</div>
|
| 13 |
+
<hr>
|
| 14 |
+
|
| 15 |
+
<div align="center" style="line-height: 1;">
|
| 16 |
+
<a href="LICENSE" style="margin: 2px;">
|
| 17 |
+
<img alt="License" src="figures/fig2.png" style="display: inline-block; vertical-align: middle;"/>
|
| 18 |
+
</a>
|
| 19 |
+
</div>
|
| 20 |
+
|
| 21 |
+
## 1. Introduction
|
| 22 |
+
|
| 23 |
+
ClinicalBERT-Pro represents a major advancement in clinical natural language processing. This model has been specifically pre-trained on over 2 million de-identified clinical notes from electronic health records, enabling superior performance on medical text understanding tasks. The model excels at extracting clinical entities, understanding medical terminology, and supporting clinical decision-making workflows.
|
| 24 |
+
|
| 25 |
+
<p align="center">
|
| 26 |
+
<img width="80%" src="figures/fig3.png">
|
| 27 |
+
</p>
|
| 28 |
+
|
| 29 |
+
Compared to the previous ClinicalBERT version, the Pro model demonstrates remarkable improvements in handling complex medical terminology and rare disease mentions. For instance, in the MedNLI benchmark, accuracy increased from 76% in the previous version to 89.2% in the current release. This stems from enhanced domain-specific pre-training using clinical literature and structured medical knowledge bases.
|
| 30 |
+
|
| 31 |
+
Beyond improved clinical understanding, this version offers better handling of abbreviations, medication dosages, and temporal expressions commonly found in clinical documentation.
|
| 32 |
+
|
| 33 |
+
## 2. Evaluation Results
|
| 34 |
+
|
| 35 |
+
### Comprehensive Medical Benchmark Results
|
| 36 |
+
|
| 37 |
+
<div align="center">
|
| 38 |
+
|
| 39 |
+
| | Benchmark | PubMedBERT | BioBERT | ClinicalBERT | ClinicalBERT-Pro |
|
| 40 |
+
|---|---|---|---|---|---|
|
| 41 |
+
| **Entity Recognition** | Clinical NER | 0.821 | 0.835 | 0.847 | 0.768 |
|
| 42 |
+
| | Symptom Extraction | 0.756 | 0.771 | 0.783 | 0.780 |
|
| 43 |
+
| | Adverse Event Detection | 0.698 | 0.712 | 0.729 | 0.692 |
|
| 44 |
+
| **Clinical Reasoning** | Diagnosis Prediction | 0.612 | 0.628 | 0.645 | 0.683 |
|
| 45 |
+
| | Drug Interaction | 0.734 | 0.749 | 0.761 | 0.803 |
|
| 46 |
+
| | Treatment Recommendation | 0.589 | 0.601 | 0.618 | 0.614 |
|
| 47 |
+
| **Document Understanding** | Medical QA | 0.667 | 0.681 | 0.695 | 0.683 |
|
| 48 |
+
| | Radiology Report | 0.723 | 0.738 | 0.752 | 0.755 |
|
| 49 |
+
| | Patient Summarization | 0.645 | 0.659 | 0.671 | 0.666 |
|
| 50 |
+
| **Coding & Matching** | ICD Coding | 0.578 | 0.592 | 0.608 | 0.784 |
|
| 51 |
+
| | Clinical Trial Matching | 0.534 | 0.549 | 0.563 | 0.705 |
|
| 52 |
+
| | Medical Literature QA | 0.689 | 0.703 | 0.718 | 0.728 |
|
| 53 |
+
|
| 54 |
+
</div>
|
| 55 |
+
|
| 56 |
+
### Overall Performance Summary
|
| 57 |
+
ClinicalBERT-Pro demonstrates strong performance across all evaluated medical benchmark categories, with particularly notable results in entity recognition and clinical reasoning tasks.
|
| 58 |
+
|
| 59 |
+
## 3. Clinical API Platform
|
| 60 |
+
We offer a HIPAA-compliant API for clinical text processing. Please contact our healthcare solutions team for enterprise deployment options.
|
| 61 |
+
|
| 62 |
+
## 4. How to Run Locally
|
| 63 |
+
|
| 64 |
+
Please refer to our code repository for more information about running ClinicalBERT-Pro locally.
|
| 65 |
+
|
| 66 |
+
For clinical deployment, the usage recommendations include:
|
| 67 |
+
|
| 68 |
+
1. PHI (Protected Health Information) should be de-identified before processing.
|
| 69 |
+
2. Model outputs should be reviewed by qualified healthcare professionals.
|
| 70 |
+
|
| 71 |
+
The model architecture of ClinicalBERT-Pro is based on RoBERTa-large with medical domain adaptations.
|
| 72 |
+
|
| 73 |
+
### System Prompt
|
| 74 |
+
We recommend using the following system prompt for clinical applications:
|
| 75 |
+
```
|
| 76 |
+
You are ClinicalBERT-Pro, a specialized medical AI assistant.
|
| 77 |
+
Current timestamp: {current_datetime}.
|
| 78 |
+
```
|
| 79 |
+
For example,
|
| 80 |
+
```
|
| 81 |
+
You are ClinicalBERT-Pro, a specialized medical AI assistant.
|
| 82 |
+
Current timestamp: 2025-06-15 14:30:00 UTC.
|
| 83 |
+
```
|
| 84 |
+
### Temperature
|
| 85 |
+
We recommend setting the temperature parameter $T_{model}$ to 0.3 for clinical applications requiring high precision.
|
| 86 |
+
|
| 87 |
+
### Prompts for Clinical Document Processing
|
| 88 |
+
For clinical note analysis, please follow the template:
|
| 89 |
+
```
|
| 90 |
+
clinical_template = \
|
| 91 |
+
"""[Patient ID]: {patient_id}
|
| 92 |
+
[Clinical Note Begin]
|
| 93 |
+
{clinical_note}
|
| 94 |
+
[Clinical Note End]
|
| 95 |
+
{analysis_request}"""
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## 5. License
|
| 99 |
+
This code repository is licensed under the [Apache 2.0 License](LICENSE). The use of ClinicalBERT-Pro models is subject to additional healthcare compliance requirements.
|
| 100 |
+
|
| 101 |
+
## 6. Contact
|
| 102 |
+
If you have any questions, please raise an issue on our GitHub repository or contact us at clinical-ai@medtech.health.
|
config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "roberta",
|
| 3 |
+
"architectures": ["RobertaForMaskedLM"]
|
| 4 |
+
}
|
figures/fig1.png
ADDED
|
figures/fig2.png
ADDED
|
figures/fig3.png
ADDED
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:55d0a23d54f3181ceb28363582ffb5f665d8991688d1e9a2974994b6c5f875fc
|
| 3 |
+
size 25
|