Update README.md
Browse files
README.md
CHANGED
|
@@ -69,6 +69,86 @@ outputs = tokenizer.batch_decode(outputs_id, skip_special_tokens=True)[0]
|
|
| 69 |
print(outputs)
|
| 70 |
```
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
## Citations
|
| 73 |
```bibtex
|
| 74 |
@article{chen2024preparedllm,
|
|
|
|
| 69 |
print(outputs)
|
| 70 |
```
|
| 71 |
|
| 72 |
+
## Model Performance
|
| 73 |
+
|
| 74 |
+
### Geoscience Ability
|
| 75 |
+
We evaluate the performance of JiuZhou using the GeoBench benchmark.<br>
|
| 76 |
+
JiuZhou outperforms GPT-3.5 in objective tasks:
|
| 77 |
+
<p align="center">
|
| 78 |
+
<br>
|
| 79 |
+
<img src="image/objective_score.png" width="800"/>
|
| 80 |
+
<br>
|
| 81 |
+
</p>
|
| 82 |
+
JiuZhou also scores higher than JiuZhou across six criteria in subjective tasks:
|
| 83 |
+
<p align="center">
|
| 84 |
+
<br>
|
| 85 |
+
<img src="image/subjective_score.png" width="800"/>
|
| 86 |
+
<br>
|
| 87 |
+
</p>
|
| 88 |
+
### General Ability
|
| 89 |
+
We evaluate the performance of JiuZhou using three benchmark datasets: C-Eval, CMMLU, and MMLU.<br>
|
| 90 |
+
Compared to other variants of Llama and Mistral models, JiuZhou shows outstanding performance:
|
| 91 |
+
<p align="center">
|
| 92 |
+
<br>
|
| 93 |
+
<img src="image/general_score.png" width="800"/>
|
| 94 |
+
<br>
|
| 95 |
+
</p>
|
| 96 |
+
## Model Training Process
|
| 97 |
+
### Training Corpus
|
| 98 |
+
The corpus consists of 50 million general documents and 3.4 million geoscience-related documents.
|
| 99 |
+
<p align="center">
|
| 100 |
+
<br>
|
| 101 |
+
<img src="image/JiuZhou-Corpus.png" width="800"/>
|
| 102 |
+
<br>
|
| 103 |
+
</p>
|
| 104 |
+
### Training Framework
|
| 105 |
+
We use the JiuZhou-Framework proposed in this study.
|
| 106 |
+
<p align="center">
|
| 107 |
+
<br>
|
| 108 |
+
<img src="image/JiuZhou-Framework.png" width="800"/>
|
| 109 |
+
<br>
|
| 110 |
+
</p>
|
| 111 |
+
### Two-stage Pre-adaptation Pre-training (TSPT)
|
| 112 |
+
TSPT improves the efficiency of using limited geoscience data and overcomes some of the technical bottlenecks in continual pretraining for LLMs.<br>
|
| 113 |
+
The difference between TSPT and single-stage training algorithms:
|
| 114 |
+
<p align="center">
|
| 115 |
+
<br>
|
| 116 |
+
<img src="image/TSPT.png" width="800"/>
|
| 117 |
+
<br>
|
| 118 |
+
</p>
|
| 119 |
+
Comparison of TSPT and one-stage pre-training algorithm performance:
|
| 120 |
+
<p align="center">
|
| 121 |
+
<br>
|
| 122 |
+
<img src="image/TSPT_score.png" width="800"/>
|
| 123 |
+
<br>
|
| 124 |
+
</p>
|
| 125 |
+
## Model Training Code
|
| 126 |
+
We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to fine-tune JiuZhou.
|
| 127 |
+
|
| 128 |
+
### Project Deployment
|
| 129 |
+
```bash
|
| 130 |
+
git clone https://github.com/THU-ESIS/JiuZhou.git
|
| 131 |
+
cd JiuZhou
|
| 132 |
+
pip install -e ".[torch,metrics]"
|
| 133 |
+
```
|
| 134 |
+
### Model Training
|
| 135 |
+
Pre-training:
|
| 136 |
+
```bash
|
| 137 |
+
llamafactory-cli train examples/train_lora/JiuZhou_pretrain_sft.yaml
|
| 138 |
+
```
|
| 139 |
+
Instruction-tuning:
|
| 140 |
+
```bash
|
| 141 |
+
llamafactory-cli train examples/train_lora/JiuZhou_lora_sft.yaml
|
| 142 |
+
```
|
| 143 |
+
Chat with the fine-tuned JiuZhou::
|
| 144 |
+
```bash
|
| 145 |
+
llamafactory-cli chat examples/inference/JiuZhou_lora_sft.yaml
|
| 146 |
+
```
|
| 147 |
+
Merge the instruction-tuned LoRA weights with the original JiuZhou weights:
|
| 148 |
+
```bash
|
| 149 |
+
llamafactory-cli export examples/merge_lora/JiuZhou_lora_sft.yaml
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
## Citations
|
| 153 |
```bibtex
|
| 154 |
@article{chen2024preparedllm,
|