toolevalxm commited on
Commit
8c1942e
·
verified ·
1 Parent(s): 023c319

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +120 -0
  2. config.json +15 -0
  3. figures/fig1.png +0 -0
  4. figures/fig2.png +0 -0
  5. figures/fig3.png +0 -0
  6. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ ---
5
+ # BioMedLM
6
+ <!-- markdownlint-disable first-line-h1 -->
7
+ <!-- markdownlint-disable html -->
8
+ <!-- markdownlint-disable no-duplicate-header -->
9
+
10
+ <div align="center">
11
+ <img src="figures/fig1.png" width="60%" alt="BioMedLM" />
12
+ </div>
13
+ <hr>
14
+
15
+ <div align="center" style="line-height: 1;">
16
+ <a href="LICENSE" style="margin: 2px;">
17
+ <img alt="License" src="figures/fig2.png" style="display: inline-block; vertical-align: middle;"/>
18
+ </a>
19
+ </div>
20
+
21
+ ## 1. Introduction
22
+
23
+ BioMedLM represents a breakthrough in biomedical natural language processing. This specialized language model has been trained on extensive medical literature, clinical notes, and healthcare documentation. In the latest version, BioMedLM demonstrates remarkable capabilities in understanding complex medical terminology, extracting clinical entities, and generating accurate medical summaries.
24
+
25
+ <p align="center">
26
+ <img width="80%" src="figures/fig3.png">
27
+ </p>
28
+
29
+ Compared to general-purpose language models, BioMedLM shows significant improvements in domain-specific tasks. For instance, in the MedQA benchmark, the model's accuracy has increased from 65% to 82.3% in the current version. This advancement stems from specialized pre-training on PubMed abstracts and clinical trial data.
30
+
31
+ Beyond clinical text understanding, this version offers enhanced capabilities in drug-drug interaction detection, adverse event extraction, and ICD-10 coding assistance.
32
+
33
+ ## 2. Evaluation Results
34
+
35
+ ### Comprehensive Benchmark Results
36
+
37
+ <div align="center">
38
+
39
+ | | Benchmark | PubMedBERT | BioBERT | ClinicalBERT | BioMedLM |
40
+ |---|---|---|---|---|---|
41
+ | **Clinical NLP Tasks** | Clinical NER | 0.823 | 0.835 | 0.841 | 0.870 |
42
+ | | Drug Interaction | 0.712 | 0.728 | 0.739 | 0.804 |
43
+ | | Medical QA | 0.654 | 0.668 | 0.682 | 0.750 |
44
+ | **Document Analysis** | Disease Classification | 0.789 | 0.801 | 0.812 | 0.895 |
45
+ | | Clinical Inference | 0.701 | 0.715 | 0.724 | 0.799 |
46
+ | | Symptom Extraction | 0.756 | 0.769 | 0.778 | 0.830 |
47
+ | | Medical Coding | 0.688 | 0.702 | 0.715 | 0.740 |
48
+ | **Report Generation** | Radiology Report | 0.621 | 0.639 | 0.651 | 0.750 |
49
+ | | Patient Summary | 0.598 | 0.615 | 0.628 | 0.719 |
50
+ | | Adverse Event | 0.734 | 0.749 | 0.761 | 0.853 |
51
+ | | Literature Mining | 0.667 | 0.682 | 0.694 | 0.780 |
52
+ | **Research Tasks**| Gene Relation | 0.578 | 0.594 | 0.608 | 0.788 |
53
+ | | Clinical Trial | 0.645 | 0.661 | 0.673 | 0.782 |
54
+ | | Pathology Analysis | 0.712 | 0.728 | 0.741 | 0.865 |
55
+ | | Safety Compliance | 0.801 | 0.815 | 0.827 | 0.844 |
56
+
57
+ </div>
58
+
59
+ ### Overall Performance Summary
60
+ BioMedLM demonstrates state-of-the-art performance across all biomedical benchmark categories, with particularly notable results in clinical entity recognition and document analysis tasks.
61
+
62
+ ## 3. Clinical API & Demo Platform
63
+ We offer a clinical demo interface and API for healthcare researchers to interact with BioMedLM. Please visit our secure portal for HIPAA-compliant access.
64
+
65
+ ## 4. How to Run Locally
66
+
67
+ Please refer to our code repository for information about deploying BioMedLM in clinical environments.
68
+
69
+ Compared to previous versions, the usage recommendations for BioMedLM have the following changes:
70
+
71
+ 1. Medical domain system prompts are recommended.
72
+ 2. Clinical context window has been expanded to 8192 tokens.
73
+
74
+ The model architecture of BioMedLM-Base is optimized for clinical document processing, with specialized attention mechanisms for medical entity relationships.
75
+
76
+ ### System Prompt
77
+ We recommend using the following system prompt for clinical applications:
78
+ ```
79
+ You are BioMedLM, a specialized biomedical AI assistant trained on medical literature and clinical documentation.
80
+ Current context: {clinical_setting}
81
+ ```
82
+ For example,
83
+ ```
84
+ You are BioMedLM, a specialized biomedical AI assistant trained on medical literature and clinical documentation.
85
+ Current context: Outpatient clinical notes review.
86
+ ```
87
+ ### Temperature
88
+ We recommend setting the temperature parameter $T_{model}$ to 0.3 for clinical applications to ensure factual accuracy.
89
+
90
+ ### Prompts for Clinical Document Processing
91
+ For clinical document analysis, please follow the template:
92
+ ```
93
+ document_template = \
94
+ """[document type]: {doc_type}
95
+ [clinical content begin]
96
+ {clinical_text}
97
+ [clinical content end]
98
+ {analysis_request}"""
99
+ ```
100
+ For literature search enhanced generation, we recommend the following template:
101
+ ```
102
+ search_clinical_template = \
103
+ '''# The following are relevant medical literature findings:
104
+ {pubmed_results}
105
+ In the search results provided, each finding is formatted as [source X begin]...[source X end]. Please cite sources appropriately using [citation:X] format. Ensure medical accuracy and evidence-based responses.
106
+ When responding:
107
+ - Prioritize peer-reviewed sources
108
+ - Note any conflicting findings
109
+ - Include confidence levels where appropriate
110
+ - Flag any potential safety concerns
111
+ # Clinical query:
112
+ {query}'''
113
+ ```
114
+
115
+ ## 5. License
116
+ This code repository is licensed under the [Apache 2.0 License](LICENSE). The use of BioMedLM models requires compliance with healthcare regulations including HIPAA where applicable.
117
+
118
+ ## 6. Contact
119
+ If you have any questions, please raise an issue on our GitHub repository or contact us at research@biomedlm.health.
120
+ ```
config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "roberta",
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "hidden_size": 1024,
7
+ "num_attention_heads": 16,
8
+ "num_hidden_layers": 24,
9
+ "vocab_size": 50265,
10
+ "domain": "biomedical",
11
+ "pretrained_on": [
12
+ "pubmed",
13
+ "clinical_notes"
14
+ ]
15
+ }
figures/fig1.png ADDED
figures/fig2.png ADDED
figures/fig3.png ADDED
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57dff1e759659abd7546e649e98e8c676e1e30465ac071021e8adec4ed79e0dd
3
+ size 37