Update README.md
Browse files
README.md
CHANGED
|
@@ -4,8 +4,9 @@
|
|
| 4 |
</h1>
|
| 5 |
</div>
|
| 6 |
|
|
|
|
| 7 |
## 🎉 News
|
| 8 |
-
- [2024-12-31] **Article [JiuZhou: Open Foundation Language Models and Effective Pre-training Framework for Geoscience](https://www.tandfonline.com/doi/full/10.1080/17538947.2025.2449708) has been accepted for publication in the *International Journal fo Digital Earth
|
| 9 |
- [2024-10-11] WeChat article: [PreparedLLM: Effective Pre-pretraining Framework for Domain-specific Large Language Models](https://mp.weixin.qq.com/s/ugJQ9tbp6Y87xA3TOWteqw).
|
| 10 |
- [2024-09-06] Released [ClimateChat](https://huggingface.co/itpossible/ClimateChat) instruct model.
|
| 11 |
- [2024-08-31] **Article [PreparedLLM: Effective Pre-pretraining Framework for Domain-specific Large Language Models](https://www.tandfonline.com/doi/full/10.1080/20964471.2024.2396159) has been accepted for publication in the *Big Earth Data* journal**.
|
|
@@ -79,26 +80,21 @@ JiuZhou outperforms GPT-3.5 in objective tasks:
|
|
| 79 |
<img src="image/objective_score.png" width="800"/>
|
| 80 |
<br>
|
| 81 |
</p>
|
| 82 |
-
|
| 83 |
-
JiuZhou also scores higher than ClimateChat across six criteria in subjective tasks:
|
| 84 |
<p align="center">
|
| 85 |
<br>
|
| 86 |
<img src="image/subjective_score.png" width="800"/>
|
| 87 |
<br>
|
| 88 |
</p>
|
| 89 |
-
|
| 90 |
### General Ability
|
| 91 |
-
|
| 92 |
-
We evaluate the performance of Chinese-Mistral-7B using three benchmark datasets: C-Eval, CMMLU, and MMLU.<br>
|
| 93 |
Compared to other variants of Llama and Mistral models, JiuZhou shows outstanding performance:
|
| 94 |
<p align="center">
|
| 95 |
<br>
|
| 96 |
<img src="image/general_score.png" width="800"/>
|
| 97 |
<br>
|
| 98 |
</p>
|
| 99 |
-
|
| 100 |
## Model Training Process
|
| 101 |
-
|
| 102 |
### Training Corpus
|
| 103 |
The corpus consists of 50 million general documents and 3.4 million geoscience-related documents.
|
| 104 |
<p align="center">
|
|
@@ -106,7 +102,6 @@ The corpus consists of 50 million general documents and 3.4 million geoscience-r
|
|
| 106 |
<img src="image/JiuZhou-Corpus.png" width="800"/>
|
| 107 |
<br>
|
| 108 |
</p>
|
| 109 |
-
|
| 110 |
### Training Framework
|
| 111 |
We use the JiuZhou-Framework proposed in this study.
|
| 112 |
<p align="center">
|
|
@@ -114,7 +109,6 @@ We use the JiuZhou-Framework proposed in this study.
|
|
| 114 |
<img src="image/JiuZhou-Framework.png" width="800"/>
|
| 115 |
<br>
|
| 116 |
</p>
|
| 117 |
-
|
| 118 |
### Two-stage Pre-adaptation Pre-training (TSPT)
|
| 119 |
TSPT improves the efficiency of using limited geoscience data and overcomes some of the technical bottlenecks in continual pretraining for LLMs.<br>
|
| 120 |
The difference between TSPT and single-stage training algorithms:
|
|
@@ -129,8 +123,6 @@ Comparison of TSPT and one-stage pre-training algorithm performance:
|
|
| 129 |
<img src="image/TSPT_score.png" width="800"/>
|
| 130 |
<br>
|
| 131 |
</p>
|
| 132 |
-
|
| 133 |
-
|
| 134 |
## Model Training Code
|
| 135 |
We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to fine-tune JiuZhou.
|
| 136 |
|
|
|
|
| 4 |
</h1>
|
| 5 |
</div>
|
| 6 |
|
| 7 |
+
|
| 8 |
## 🎉 News
|
| 9 |
+
- [2024-12-31] **Article [JiuZhou: Open Foundation Language Models and Effective Pre-training Framework for Geoscience](https://www.tandfonline.com/doi/full/10.1080/17538947.2025.2449708) has been accepted for publication in the *International Journal fo Digital Earth***. [Code and Data](https://github.com/THU-ESIS/JiuZhou).
|
| 10 |
- [2024-10-11] WeChat article: [PreparedLLM: Effective Pre-pretraining Framework for Domain-specific Large Language Models](https://mp.weixin.qq.com/s/ugJQ9tbp6Y87xA3TOWteqw).
|
| 11 |
- [2024-09-06] Released [ClimateChat](https://huggingface.co/itpossible/ClimateChat) instruct model.
|
| 12 |
- [2024-08-31] **Article [PreparedLLM: Effective Pre-pretraining Framework for Domain-specific Large Language Models](https://www.tandfonline.com/doi/full/10.1080/20964471.2024.2396159) has been accepted for publication in the *Big Earth Data* journal**.
|
|
|
|
| 80 |
<img src="image/objective_score.png" width="800"/>
|
| 81 |
<br>
|
| 82 |
</p>
|
| 83 |
+
JiuZhou also scores higher than JiuZhou across six criteria in subjective tasks:
|
|
|
|
| 84 |
<p align="center">
|
| 85 |
<br>
|
| 86 |
<img src="image/subjective_score.png" width="800"/>
|
| 87 |
<br>
|
| 88 |
</p>
|
|
|
|
| 89 |
### General Ability
|
| 90 |
+
We evaluate the performance of JiuZhou using three benchmark datasets: C-Eval, CMMLU, and MMLU.<br>
|
|
|
|
| 91 |
Compared to other variants of Llama and Mistral models, JiuZhou shows outstanding performance:
|
| 92 |
<p align="center">
|
| 93 |
<br>
|
| 94 |
<img src="image/general_score.png" width="800"/>
|
| 95 |
<br>
|
| 96 |
</p>
|
|
|
|
| 97 |
## Model Training Process
|
|
|
|
| 98 |
### Training Corpus
|
| 99 |
The corpus consists of 50 million general documents and 3.4 million geoscience-related documents.
|
| 100 |
<p align="center">
|
|
|
|
| 102 |
<img src="image/JiuZhou-Corpus.png" width="800"/>
|
| 103 |
<br>
|
| 104 |
</p>
|
|
|
|
| 105 |
### Training Framework
|
| 106 |
We use the JiuZhou-Framework proposed in this study.
|
| 107 |
<p align="center">
|
|
|
|
| 109 |
<img src="image/JiuZhou-Framework.png" width="800"/>
|
| 110 |
<br>
|
| 111 |
</p>
|
|
|
|
| 112 |
### Two-stage Pre-adaptation Pre-training (TSPT)
|
| 113 |
TSPT improves the efficiency of using limited geoscience data and overcomes some of the technical bottlenecks in continual pretraining for LLMs.<br>
|
| 114 |
The difference between TSPT and single-stage training algorithms:
|
|
|
|
| 123 |
<img src="image/TSPT_score.png" width="800"/>
|
| 124 |
<br>
|
| 125 |
</p>
|
|
|
|
|
|
|
| 126 |
## Model Training Code
|
| 127 |
We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to fine-tune JiuZhou.
|
| 128 |
|