itpossible
/

Prepared-Llama

Text Generation

text-generation-inference

Model card Files Files and versions

itpossible commited on Jan 14, 2025

Commit

ac5c697

·

verified ·

1 Parent(s): 134ac99

Update README.md

Files changed (1) hide show

README.md +80 -0

README.md CHANGED Viewed

@@ -69,6 +69,86 @@ outputs = tokenizer.batch_decode(outputs_id, skip_special_tokens=True)[0]
 print(outputs)
 ```
 ## Citations
 ```bibtex
 @article{chen2024preparedllm,

 print(outputs)
 ```
+## Model Performance
+### Geoscience Ability
+We evaluate the performance of JiuZhou using the GeoBench benchmark.<br>
+JiuZhou outperforms GPT-3.5 in objective tasks:
+<p align="center">
+    <br>
+    <img src="image/objective_score.png" width="800"/>
+    <br>
+</p>
+JiuZhou also scores higher than JiuZhou across six criteria in subjective tasks:
+<p align="center">
+    <br>
+    <img src="image/subjective_score.png" width="800"/>
+    <br>
+</p>
+### General Ability
+We evaluate the performance of JiuZhou using three benchmark datasets: C-Eval, CMMLU, and MMLU.<br>
+Compared to other variants of Llama and Mistral models, JiuZhou shows outstanding performance:
+<p align="center">
+    <br>
+    <img src="image/general_score.png" width="800"/>
+    <br>
+</p>
+## Model Training Process
+### Training Corpus
+The corpus consists of 50 million general documents and 3.4 million geoscience-related documents.
+<p align="center">
+    <br>
+    <img src="image/JiuZhou-Corpus.png" width="800"/>
+    <br>
+</p>
+### Training Framework
+We use the JiuZhou-Framework proposed in this study.
+<p align="center">
+    <br>
+    <img src="image/JiuZhou-Framework.png" width="800"/>
+    <br>
+</p>
+### Two-stage Pre-adaptation Pre-training (TSPT)
+TSPT improves the efficiency of using limited geoscience data and overcomes some of the technical bottlenecks in continual pretraining for LLMs.<br>
+The difference between TSPT and single-stage training algorithms:
+<p align="center">
+    <br>
+    <img src="image/TSPT.png" width="800"/>
+    <br>
+</p>
+Comparison of TSPT and one-stage pre-training algorithm performance:
+<p align="center">
+    <br>
+    <img src="image/TSPT_score.png" width="800"/>
+    <br>
+</p>
+## Model Training Code
+We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to fine-tune JiuZhou.
+### Project Deployment
+```bash
+git clone https://github.com/THU-ESIS/JiuZhou.git
+cd JiuZhou
+pip install -e ".[torch,metrics]"
+```
+### Model Training
+Pre-training：
+```bash
+llamafactory-cli train examples/train_lora/JiuZhou_pretrain_sft.yaml
+```
+Instruction-tuning：
+```bash
+llamafactory-cli train examples/train_lora/JiuZhou_lora_sft.yaml
+```
+Chat with the fine-tuned JiuZhou:：
+```bash
+llamafactory-cli chat examples/inference/JiuZhou_lora_sft.yaml
+```
+Merge the instruction-tuned LoRA weights with the original JiuZhou weights:
+```bash
+llamafactory-cli export examples/merge_lora/JiuZhou_lora_sft.yaml
+```
 ## Citations
 ```bibtex
 @article{chen2024preparedllm,