Boyue27 commited on
Commit
635b8d6
·
verified ·
1 Parent(s): 49d4439

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -42,7 +42,7 @@ Using open source instruction tuning datasets are composed of three main parts:
42
  By combining the above three parts, we form a large-scale, high-quality, medical-specific instruction tuning dataset, consisting of 202M tokens. We further tune Medico-mistral on this dataset, resulting in sft_medico-mistral.
43
 
44
  ## Training Details
45
-
46
  ### Training Data
47
 
48
  The training data combines diverse datasets from medical consultations, rationale QA, and knowledge graphs to ensure comprehensive medical knowledge coverage and reasoning ability.
 
42
  By combining the above three parts, we form a large-scale, high-quality, medical-specific instruction tuning dataset, consisting of 202M tokens. We further tune Medico-mistral on this dataset, resulting in sft_medico-mistral.
43
 
44
  ## Training Details
45
+ Our model is based on Mixtral-8x7B-v0.1-Instruct, a generic English LLM with 13 billion parameters. Training was performed on 8 A100-80G GPUs via parallelization. We first inject knowledge into the base model Mistral to optimize the autoregressive loss. During training, we set the maximum context length to 4096 and the batch size to 1024. the model was trained using the AdamW optimizer (Loshchilov and Hutter, 2017) with a learning rate of 2e-5. we employed a fully-sliced data parallel (FSDP) acceleration strategy, the bf16 (brain floating-point) data format, and gradient checkpoints ( Chen et al. 2016). The model was trained using 8 A100 GPUs for 1 epoch of knowledge injection. Afterwards, we used 7 A100 GPUs to perform 5 epochs of healthcare-specific instruction tuning in the SFT phase with a batch size of 896 . During the instruction tuning phase, all sequences are processed in each epoch.
46
  ### Training Data
47
 
48
  The training data combines diverse datasets from medical consultations, rationale QA, and knowledge graphs to ensure comprehensive medical knowledge coverage and reasoning ability.