SNOWTEAM
/

DoctorLLM

Text Generation

text-generation-inference

Model card Files Files and versions

Boyue27 commited on Jul 25, 2024

Commit

635b8d6

·

verified ·

1 Parent(s): 49d4439

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -42,7 +42,7 @@ Using open source instruction tuning datasets are composed of three main parts:
 By combining the above three parts, we form a large-scale, high-quality, medical-specific instruction tuning dataset, consisting of 202M tokens. We further tune Medico-mistral on this dataset, resulting in sft_medico-mistral.
 ## Training Details
 ### Training Data
 The training data combines diverse datasets from medical consultations, rationale QA, and knowledge graphs to ensure comprehensive medical knowledge coverage and reasoning ability.

 By combining the above three parts, we form a large-scale, high-quality, medical-specific instruction tuning dataset, consisting of 202M tokens. We further tune Medico-mistral on this dataset, resulting in sft_medico-mistral.
 ## Training Details
+Our model is based on Mixtral-8x7B-v0.1-Instruct, a generic English LLM with 13 billion parameters. Training was performed on 8 A100-80G GPUs via parallelization. We first inject knowledge into the base model Mistral to optimize the autoregressive loss. During training, we set the maximum context length to 4096 and the batch size to 1024. the model was trained using the AdamW optimizer (Loshchilov and Hutter, 2017) with a learning rate of 2e-5. we employed a fully-sliced data parallel (FSDP) acceleration strategy, the bf16 (brain floating-point) data format, and gradient checkpoints ( Chen et al. 2016). The model was trained using 8 A100 GPUs for 1 epoch of knowledge injection. Afterwards, we used 7 A100 GPUs to perform 5 epochs of healthcare-specific instruction tuning in the SFT phase with a batch size of 896 . During the instruction tuning phase, all sequences are processed in each epoch.
 ### Training Data
 The training data combines diverse datasets from medical consultations, rationale QA, and knowledge graphs to ensure comprehensive medical knowledge coverage and reasoning ability.