Xidong commited on
Commit
bd201e4
Β·
verified Β·
1 Parent(s): 3fd6345

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -38
README.md CHANGED
@@ -1,48 +1,41 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- # Multilingual Medicine: Model, Dataset, Benchmark, Code
5
 
6
- Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
 
7
 
 
 
8
 
9
- <p align="center">
10
- πŸ‘¨πŸ»β€πŸ’»<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> β€’πŸ“ƒ <a href="https://arxiv.org/abs/2403.03640" target="_blank">Paper</a> β€’ 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> β€’ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
11
- <br> <a href="./README_zh.md"> δΈ­ζ–‡ </a> | <a href="./README.md"> English
12
- </p>
13
-
14
- ![Apollo](assets/apollo_medium_final.png)
15
 
16
  ## 🌈 Update
17
 
18
- * **[2024.03.07]** [Paper](https://arxiv.org/abs/2403.03640) released.
19
- * **[2024.02.12]** <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> and <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> is publishedοΌπŸŽ‰
20
- * **[2024.01.23]** Apollo repo is publishedοΌπŸŽ‰
21
-
22
 
23
  ## Results
24
- πŸ€—<a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a> πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-34B" target="_blank">Apollo-34B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-72B" target="_blank">Apollo-72B</a>
25
-
26
- πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B-GGUF" target="_blank">Apollo-0.5B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B-GGUF" target="_blank">Apollo-2B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B-GGUF" target="_blank">Apollo-6B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B-GGUF" target="_blank">Apollo-7B-GGUF</a>
27
-
28
-
 
 
 
29
  ![Apollo](assets/result.png)
30
-
31
-
32
-
33
 
34
 
35
  ## Dataset & Evaluation
36
 
37
  - Dataset
38
- πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>
39
-
40
- <details><summary>Click to expand</summary>
41
 
42
  ![Apollo](assets/dataset.png)
43
 
44
- - [Zip File](https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus/blob/main/ApolloCorpus.zip)
45
- - [Data category](https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus/tree/main/train)
46
  - Pretrain:
47
  - data item:
48
  - json_name: {data_source}_{language}_{data_type}.json
@@ -85,18 +78,16 @@ Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
85
  ],
86
  ...
87
  ]
88
- ```
89
 
90
 
91
  </details>
92
-
93
-
94
-
95
  - Evaluation
96
- πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
97
 
98
- <details><summary>Click to expand</summary>
99
-
100
  - EN:
101
  - [MedQA-USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options)
102
  - [MedMCQA](https://huggingface.co/datasets/medmcqa/viewer/default/test)
@@ -123,17 +114,77 @@ Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
123
 
124
  </details>
125
 
126
-
127
  ## Results reproduction
128
  <details><summary>Click to expand</summary>
129
 
130
- **Waiting for Update**
131
-
132
 
 
 
 
 
 
133
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  </details>
135
 
 
 
 
 
 
136
 
 
 
 
137
 
138
 
139
  ## Citation
 
1
+ # MedJamba
 
 
 
2
 
3
+ Multilingual Medical Model Based On Jamba
4
+ <center>
5
 
6
+ ![Python 3.10](https://img.shields.io/badge/Python-3.10-lightblue) ![Pytorch 2.1.2](https://img.shields.io/badge/PyTorch-2.1.2-lightblue) ![transformers](https://img.shields.io/badge/transformers-4.34.0.dev0%2B-lightblue) ![accelerate](https://img.shields.io/badge/accelerate-0.22-lightblue)
7
+ </center>
8
 
 
 
 
 
 
 
9
 
10
  ## 🌈 Update
11
 
12
+ * **[2024.04.25]** MedJamba Model is publishedοΌπŸŽ‰
13
+
14
+
 
15
 
16
  ## Results
17
+ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-34B" target="_blank">Apollo-34B</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-72B" target="_blank">Apollo-72B</a>
18
+
19
+ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/MedJamba" target="_blank">Apollo-53B (MedJamba)</a>
20
+
21
+ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B-GGUF" target="_blank">Apollo-0.5B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B-GGUF" target="_blank">Apollo-2B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B-GGUF" target="_blank">Apollo-6B-GGUF</a> β€’ πŸ€— <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B-GGUF" target="_blank">Apollo-7B-GGUF</a>
22
+
23
+
24
+
25
  ![Apollo](assets/result.png)
 
 
 
26
 
27
 
28
  ## Dataset & Evaluation
29
 
30
  - Dataset
31
+ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus
32
+
33
+ <details><summary>Click to expand</summary>
34
 
35
  ![Apollo](assets/dataset.png)
36
 
37
+ - [Zip File](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/blob/main/Medbase_data-datasets.zip)
38
+ - [Data category](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/tree/main/train)
39
  - Pretrain:
40
  - data item:
41
  - json_name: {data_source}_{language}_{data_type}.json
 
78
  ],
79
  ...
80
  ]
81
+ ```
82
 
83
 
84
  </details>
85
+
 
 
86
  - Evaluation
87
+ πŸ€— <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
88
 
89
+ <details><summary>Click to expand</summary>
90
+
91
  - EN:
92
  - [MedQA-USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options)
93
  - [MedMCQA](https://huggingface.co/datasets/medmcqa/viewer/default/test)
 
114
 
115
  </details>
116
 
117
+
118
  ## Results reproduction
119
  <details><summary>Click to expand</summary>
120
 
121
+ 1. Download Dataset for project:
 
122
 
123
+ ```
124
+ bash 0.download_data.sh
125
+ ```
126
+
127
+ 2. Prepare test and dev for specific model:
128
 
129
+
130
+ - Create test data for with special token, you can use ./util/check.ipynb to check models' special tokens
131
+
132
+ ```
133
+ bash 1.data_process_test&dev.sh
134
+ ```
135
+
136
+ 3. Prepare train data for specific model (Create tokenized data in advance):
137
+
138
+
139
+ - You can adjust data Training order and Training Epoch in this step
140
+
141
+ ```
142
+ bash 2.data_process_train.sh
143
+ ```
144
+
145
+ 4. Train the model
146
+
147
+
148
+ - Multi Nodes refer to ./scripts/multi_node_train_*.sh
149
+ ```
150
+ pip install causal-conv1d>=1.2.0
151
+ pip install mamba-ssm
152
+ ```
153
+
154
+ Node 0:
155
+ ```
156
+ bash ./scripts/3.multinode_train_jamba_rank0.sh
157
+ ```
158
+ ...
159
+ Node 4:
160
+ ```
161
+ bash ./scripts/3.multinode_train_jamba_rank4.sh
162
+ ```
163
+
164
+
165
+ 5. Evaluate your model: Generate score for benchmark
166
+
167
+ ```
168
+ bash 4.eval.sh
169
+ ```
170
+
171
+ 6. Evaluate your model: Play with your ckpts in bash
172
+
173
+ ```
174
+ python ./src/evaluate/cli_demo.py --model_name='./ckpts/your/path/tfmr'
175
+ ```
176
+
177
  </details>
178
 
179
+ ## To do
180
+
181
+ - Long Context Capability Evaluation and new Long-Med Benchmark
182
+
183
+ ## Acknowledgment
184
 
185
+ - [HuatuoGPT-II](https://github.com/FreedomIntelligence/HuatuoGPT-II)
186
+ - [proxy-tuning](https://github.com/alisawuffles/proxy-tuning)
187
+ - [Apollo](https://github.com/FreedomIntelligence/Apollo)
188
 
189
 
190
  ## Citation