Update README.md
Browse files
README.md
CHANGED
|
@@ -9,36 +9,51 @@ license: apache-2.0
|
|
| 9 |
---
|
| 10 |
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
-
|
| 19 |
-
- **🌐性能优化**:代码中设置了`dtype=torch.bfloat16`来启用训练,这有助于提高训练速度和降低显存消耗,但需确保硬件支持此特性。
|
| 20 |
-
- **🤖语言支持**:该模型目前支持在中文和英文数据集上训练, 在训练分词器时没有加入其他语言的文本,BBPE分词器不会存在OOV问题,但是对别的语言支持比较差
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
|
|
|
| 25 |
|
| 26 |
-
(
|
| 27 |
|
| 28 |
-
|
| 29 |
-
数据文件的格式应与模型的输入要求一致,最好是经过tokenizer处理过后的token_id, 为了节省内存占用采用torch.Tensor 存储id,(如果使用python的list, 在读取训练数据的时候内存占用大概是原来的两倍以上,因为python似乎是默认采用64位数精度存储的数据, 但是实际上int32足够)
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
```bash
|
| 36 |
-
|
|
|
|
| 37 |
```
|
| 38 |
|
| 39 |
-
(3)
|
| 40 |
-
|
| 41 |
-
使用以下命令运行训练脚本,并根据需要调整参数:
|
| 42 |
|
| 43 |
```bash
|
| 44 |
python train.py \
|
|
@@ -50,40 +65,195 @@ python train.py \
|
|
| 50 |
--n_iter_ckpt=10000 \
|
| 51 |
--ckpt_dir checkpoints
|
| 52 |
```
|
| 53 |
-
--train_type: 指定训练的类型,可选值有seq, sft, dpo
|
| 54 |
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
|
|
|
|
|
|
|
| 68 |
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
|
|
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
如果您想使用这个模型进行对话聊天, 请打开 chat.py 文件,并运行它。
|
| 79 |
-
或者, 您可以使用流式输出接口/对话生成接口完成对话
|
| 80 |
|
| 81 |
```python
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
while True:
|
| 89 |
query = input(">> ")
|
|
@@ -91,36 +261,34 @@ while True:
|
|
| 91 |
break
|
| 92 |
|
| 93 |
response_size = 0
|
| 94 |
-
for response,
|
| 95 |
query=query,
|
| 96 |
-
history=
|
| 97 |
temperature=0.85,
|
| 98 |
top_p=0.95,
|
| 99 |
top_k=50
|
| 100 |
):
|
| 101 |
print(response[response_size:], end="")
|
| 102 |
response_size = len(response)
|
| 103 |
-
|
| 104 |
```
|
| 105 |
|
| 106 |
-
|
| 107 |
-
|
| 108 |
```python
|
| 109 |
-
|
|
|
|
| 110 |
|
| 111 |
-
|
| 112 |
-
model =
|
| 113 |
-
|
| 114 |
|
| 115 |
while True:
|
| 116 |
query = input(">> ")
|
| 117 |
if query == "!exit":
|
| 118 |
break
|
| 119 |
|
| 120 |
-
|
| 121 |
-
response = model.generate(
|
| 122 |
query=query,
|
| 123 |
-
history=
|
| 124 |
temperature=0.85,
|
| 125 |
top_p=0.95,
|
| 126 |
top_k=50
|
|
@@ -128,11 +296,43 @@ while True:
|
|
| 128 |
print(response)
|
| 129 |
```
|
| 130 |
|
| 131 |
-
|
| 132 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
-
|
|
|
|
|
|
|
| 137 |
|
| 138 |
-
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
|
| 12 |
+
<div style="display: flex; flex-direction: column; align-items: center; justify-content: center; text-align: center; font-size: 16px; font-weight: bold; margin-top: 50px;">
|
| 13 |
+
|
| 14 |
+
<div>
|
| 15 |
+
<a href="#english" style="text-decoration: none; margin: 0 10px; color: blue;">English</a> |
|
| 16 |
+
<a href="#chinese" style="text-decoration: none; margin: 0 10px; color: blue;">中文</a>
|
| 17 |
+
</div>
|
| 18 |
|
| 19 |
+
<h1 style="margin: 20px 0 0 0; font-size: 2.5em; font-weight: bold;">KHAOSZ </h1>
|
| 20 |
+
</div>
|
| 21 |
|
| 22 |
+
<h2 id="english">English Version</h2>
|
| 23 |
|
| 24 |
+
This is a Chinese-English bilingual Transformer model supporting both languages. It contains model configurations and training workflows, completing training by loading parameters defined in `params/config.json`. The training script `train.py` parses command-line arguments, including dataset root directory, number of training epochs, batch size, checkpoint interval, and checkpoint directory.
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
**Model Download Options (Choose One):**
|
| 27 |
|
| 28 |
+
1. Visit [HuggingFace](https://huggingface.co/ViperEk/KHAOSZ) to access **Files and versions**
|
| 29 |
+
2. Run `params/download.py` to download parameters
|
| 30 |
|
| 31 |
+
**Demo Video:** [bilibili](https://www.bilibili.com/video/BV1z5RPYHEkd)
|
| 32 |
|
| 33 |
+
Training dataset sources are listed in the **Model Card** section of the HuggingFace download link.
|
|
|
|
| 34 |
|
| 35 |
+
**License:** Code follows Apache-2.0 protocol. Please credit the source code when used.
|
| 36 |
|
| 37 |
+
- **📊 Device Selection:** Code defaults to CUDA training
|
| 38 |
+
- **🌐 Performance Optimization:** `dtype=torch.bfloat16` is enabled to accelerate training and reduce memory usage. Ensure hardware supports this feature.
|
| 39 |
+
- **🤖 Language Support:** Model supports Chinese and English training. The BBPE tokenizer was trained without multilingual text, so OOV (out-of-vocabulary) issues are minimized for these languages but may exist for others.
|
| 40 |
+
|
| 41 |
+
### 📌 Training Guide
|
| 42 |
+
|
| 43 |
+
To train this Transformer model, follow these steps:
|
| 44 |
+
|
| 45 |
+
**(1). Prepare Dataset:**
|
| 46 |
+
|
| 47 |
+
Place datasets in the designated root directory. Files should be text documents in Chinese, English, or mixed. Format should align with model input requirements - preferably pre-tokenized token_ids stored as `torch.Tensor` (using `torch.Tensor` saves memory compared to Python lists, which default to 64-bit precision).
|
| 48 |
+
|
| 49 |
+
**(2). Install Dependencies:**
|
| 50 |
|
| 51 |
```bash
|
| 52 |
+
pip install -r requirements.txt
|
| 53 |
+
pip install .
|
| 54 |
```
|
| 55 |
|
| 56 |
+
**(3). Run Training Script:**
|
|
|
|
|
|
|
| 57 |
|
| 58 |
```bash
|
| 59 |
python train.py \
|
|
|
|
| 65 |
--n_iter_ckpt=10000 \
|
| 66 |
--ckpt_dir checkpoints
|
| 67 |
```
|
|
|
|
| 68 |
|
| 69 |
+
**Parameters Explanation:**
|
| 70 |
+
- `--train_type`: Training type (seq, sft, dpo)
|
| 71 |
+
- `--data_root_path`: Dataset root directory
|
| 72 |
+
- `--n_epoch`: Total training epochs
|
| 73 |
+
- `--batch_size`: Batch size
|
| 74 |
+
- `--n_iter_step`: Number of batches per training step
|
| 75 |
+
- `--warning_step`: Warmup steps
|
| 76 |
+
- `--max_lr`: Maximum learning rate (uses warmup + cosine decay)
|
| 77 |
+
- `--n_iter_ckpt`: Checkpoint saving interval
|
| 78 |
+
- `--ckpt_dir`: Checkpoint directory
|
| 79 |
+
- `--resume_dir`: Path to resume training from checkpoint
|
| 80 |
|
| 81 |
+
Training logs are saved in `train_log.txt`. Checkpoints will be stored in the specified directory for resuming training or evaluation.
|
| 82 |
|
| 83 |
+
### 👉 Usage Guide
|
| 84 |
|
| 85 |
+
**(1). Chatting with the Model:**
|
| 86 |
|
| 87 |
+
Open `chat.py` or use streaming/non-streaming interfaces:
|
| 88 |
|
| 89 |
+
**Streaming Output:**
|
| 90 |
+
```python
|
| 91 |
+
import torch
|
| 92 |
+
from khaosz import Khaosz
|
| 93 |
|
| 94 |
+
model_dir = "your_model_parameter_dir"
|
| 95 |
+
model = Khaosz(model_dir).to(device='cuda', dtype=torch.bfloat16)
|
| 96 |
+
history = []
|
| 97 |
|
| 98 |
+
while True:
|
| 99 |
+
query = input(">> ")
|
| 100 |
+
if query == "!exit":
|
| 101 |
+
break
|
| 102 |
+
|
| 103 |
+
response_size = 0
|
| 104 |
+
for response, history in model.stream_generate(
|
| 105 |
+
query=query,
|
| 106 |
+
history=history,
|
| 107 |
+
temperature=0.85,
|
| 108 |
+
top_p=0.95,
|
| 109 |
+
top_k=50
|
| 110 |
+
):
|
| 111 |
+
print(response[response_size:], end="")
|
| 112 |
+
response_size = len(response)
|
| 113 |
+
```
|
| 114 |
|
| 115 |
+
**Non-streaming Output:**
|
| 116 |
+
```python
|
| 117 |
+
import torch
|
| 118 |
+
from khaosz import Khaosz
|
| 119 |
|
| 120 |
+
model_dir = "your_model_parameter_dir"
|
| 121 |
+
model = Khaosz(model_dir).to(device='cuda', dtype=torch.bfloat16)
|
| 122 |
+
history = []
|
| 123 |
|
| 124 |
+
while True:
|
| 125 |
+
query = input(">> ")
|
| 126 |
+
if query == "!exit":
|
| 127 |
+
break
|
| 128 |
+
|
| 129 |
+
response = model.generate(
|
| 130 |
+
query=query,
|
| 131 |
+
history=history,
|
| 132 |
+
temperature=0.85,
|
| 133 |
+
top_p=0.95,
|
| 134 |
+
top_k=50
|
| 135 |
+
)
|
| 136 |
+
print(response)
|
| 137 |
+
```
|
| 138 |
|
| 139 |
+
**(2) Retrieval-Augmented Generation (RAG):**
|
|
|
|
|
|
|
| 140 |
|
| 141 |
```python
|
| 142 |
+
import torch
|
| 143 |
+
from khaosz import Khaosz
|
| 144 |
+
|
| 145 |
+
model_dir = "your_model_parameter_dir"
|
| 146 |
+
model = Khaosz(model_dir).to(device='cuda', dtype=torch.bfloat16)
|
| 147 |
+
|
| 148 |
+
retrieved_content = model.retrieve_generate(
|
| 149 |
+
query=query,
|
| 150 |
+
retrieve_top_k=5,
|
| 151 |
+
temperature=0.6,
|
| 152 |
+
top_k=30,
|
| 153 |
+
top_p=0.95
|
| 154 |
+
)
|
| 155 |
+
print(retrieved_content)
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
### 📌 Model Specifications
|
| 159 |
+
|
| 160 |
+
This model is based on a 24-layer Transformer with parameters defined in `config.json`, totaling approximately 1.0 billion (1.0B) parameters.
|
| 161 |
+
|
| 162 |
+
**Key Design Choices:**
|
| 163 |
+
- Weight tying between embedding and final linear layers (standard for small models to save parameters)
|
| 164 |
+
- Embedding layer optimization: Without weight tying, a 10,000-word vocabulary would consume ~102M parameters (0.1B)
|
| 165 |
+
|
| 166 |
+
**Limitations:**
|
| 167 |
+
- May struggle with complex language phenomena due to smaller parameter size
|
| 168 |
+
- Prone to overfitting on specialized datasets
|
| 169 |
+
- Limited multilingual capabilities
|
| 170 |
+
|
| 171 |
+
**Advantages:**
|
| 172 |
+
- Runs efficiently on lower-spec hardware
|
| 173 |
+
- Shorter training time compared to larger models
|
| 174 |
+
|
| 175 |
+
**Training Pipeline:**
|
| 176 |
+
The model has completed pre-training + SFT (Supervised Fine-Tuning) + DPO (Direct Preference Optimization) workflows. All corresponding training code is included in the repository.
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
<h2 id="chinese">中文版本</h2>
|
| 180 |
+
这是一个支持中英文双语的 Transformer 模型,能够处理两种语言。模型包含配置文件和训练流程,通过加载 `params/config.json` 中定义的参数完成训练。训练脚本 `train.py` 支持命令行参数解析,包括数据集根目录、训练轮数(epochs)、批量大小(batch size)、检查点保存间隔、检查点目录等。
|
| 181 |
+
|
| 182 |
+
**模型下载选项(任选其一):**
|
| 183 |
+
|
| 184 |
+
1. 访问 [HuggingFace](https://huggingface.co/ViperEk/KHAOSZ) 查看 **Files and versions**
|
| 185 |
+
2. 运行 `params/download.py` 下载模型参数
|
| 186 |
+
|
| 187 |
+
**演示视频:** [bilibili](https://www.bilibili.com/video/BV1z5RPYHEkd)
|
| 188 |
|
| 189 |
+
训练数据来源请参见 HuggingFace 下载页面中的 **Model Card** 部分。
|
| 190 |
+
|
| 191 |
+
**许可证:** 代码遵循 Apache-2.0 协议,使用时请注明出处。
|
| 192 |
+
|
| 193 |
+
- **📊 设备选择:** 默认使用 CUDA 进行训练
|
| 194 |
+
- **🌐 性能优化:** 启用 `dtype=torch.bfloat16` 以加速训练并减少内存占用,请确保硬件支持该特性
|
| 195 |
+
- **🤖 语言支持:** 模型支持中文和英文训练。由于 BBPE 分词器未使用多语言文本训练,因此中英文的 OOV(未登录词)问题较少,其他语言可能存在 OOV 问题
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
|
| 199 |
+
### 📌 训练指南
|
| 200 |
+
|
| 201 |
+
要训练该 Transformer 模型,请按照以下步骤操作:
|
| 202 |
+
|
| 203 |
+
#### **(1). 准备数据集:**
|
| 204 |
+
|
| 205 |
+
将数据集放置在指定的根目录下。文件应为包含中文、英文或混合文本的文本文档。格式应符合模型输入要求——建议使用预分词后的 `token_ids` 并以 `torch.Tensor` 格式保存(使用 `torch.Tensor` 相比 Python 列表更节��内存,列表默认为 64 位精度)。
|
| 206 |
+
|
| 207 |
+
#### **(2). 安装依赖:**
|
| 208 |
+
|
| 209 |
+
```bash
|
| 210 |
+
pip install -r requirements.txt
|
| 211 |
+
pip install .
|
| 212 |
+
```
|
| 213 |
+
|
| 214 |
+
#### **(3). 运行训练脚本:**
|
| 215 |
+
|
| 216 |
+
```bash
|
| 217 |
+
python train.py \
|
| 218 |
+
--train_type=train_type[seq, sft, dpo] \
|
| 219 |
+
--data_root_path=/path/to/dataset \
|
| 220 |
+
--n_epoch=5 \
|
| 221 |
+
--batch_size=8 \
|
| 222 |
+
--max_lr=2e-4 \
|
| 223 |
+
--n_iter_ckpt=10000 \
|
| 224 |
+
--ckpt_dir checkpoints
|
| 225 |
+
```
|
| 226 |
+
|
| 227 |
+
**参数说明:**
|
| 228 |
+
- `--train_type`: 训练类型(seq, sft, dpo)
|
| 229 |
+
- `--data_root_path`: 数据集根目录
|
| 230 |
+
- `--n_epoch`: 总训练轮数
|
| 231 |
+
- `--batch_size`: 批量大小
|
| 232 |
+
- `--n_iter_step`: 每个训练步骤的 batch 数量
|
| 233 |
+
- `--warning_step`: 预热步数(warmup steps)
|
| 234 |
+
- `--max_lr`: 最大学习率(使用预热 + 余弦衰减)
|
| 235 |
+
- `--n_iter_ckpt`: 检查点保存间隔
|
| 236 |
+
- `--ckpt_dir`: 检查点保存目录
|
| 237 |
+
- `--resume_dir`: 从指定路径恢复训练
|
| 238 |
+
|
| 239 |
+
训练日志将保存在 `train_log.txt` 中。检查点将保存在指定目录,用于恢复训练或评估。
|
| 240 |
+
|
| 241 |
+
|
| 242 |
+
|
| 243 |
+
### 👉 使用指南
|
| 244 |
+
|
| 245 |
+
#### **(1). 与模型对话:**
|
| 246 |
+
|
| 247 |
+
打开 `chat.py` 或使用流式/非流式接口:
|
| 248 |
+
|
| 249 |
+
**流式输出:**
|
| 250 |
+
```python
|
| 251 |
+
import torch
|
| 252 |
+
from khaosz import Khaosz
|
| 253 |
+
|
| 254 |
+
model_dir = "your_model_parameter_dir"
|
| 255 |
+
model = Khaosz(model_dir).to(device='cuda', dtype=torch.bfloat16)
|
| 256 |
+
history = []
|
| 257 |
|
| 258 |
while True:
|
| 259 |
query = input(">> ")
|
|
|
|
| 261 |
break
|
| 262 |
|
| 263 |
response_size = 0
|
| 264 |
+
for response, history in model.stream_generate(
|
| 265 |
query=query,
|
| 266 |
+
history=history,
|
| 267 |
temperature=0.85,
|
| 268 |
top_p=0.95,
|
| 269 |
top_k=50
|
| 270 |
):
|
| 271 |
print(response[response_size:], end="")
|
| 272 |
response_size = len(response)
|
|
|
|
| 273 |
```
|
| 274 |
|
| 275 |
+
**非流式输出:**
|
|
|
|
| 276 |
```python
|
| 277 |
+
import torch
|
| 278 |
+
from khaosz import Khaosz
|
| 279 |
|
| 280 |
+
model_dir = "your_model_parameter_dir"
|
| 281 |
+
model = Khaosz(model_dir).to(device='cuda', dtype=torch.bfloat16)
|
| 282 |
+
history = []
|
| 283 |
|
| 284 |
while True:
|
| 285 |
query = input(">> ")
|
| 286 |
if query == "!exit":
|
| 287 |
break
|
| 288 |
|
| 289 |
+
response = model.generate(
|
|
|
|
| 290 |
query=query,
|
| 291 |
+
history=history,
|
| 292 |
temperature=0.85,
|
| 293 |
top_p=0.95,
|
| 294 |
top_k=50
|
|
|
|
| 296 |
print(response)
|
| 297 |
```
|
| 298 |
|
| 299 |
+
#### **(2). 基于检索的生成(RAG):**
|
| 300 |
+
|
| 301 |
+
```python
|
| 302 |
+
import torch
|
| 303 |
+
from khaosz import Khaosz
|
| 304 |
+
|
| 305 |
+
model_dir = "your_model_parameter_dir"
|
| 306 |
+
model = Khaosz(model_dir).to(device='cuda', dtype=torch.bfloat16)
|
| 307 |
+
|
| 308 |
+
retrieved_content = model.retrieve_generate(
|
| 309 |
+
query=query,
|
| 310 |
+
retrieve_top_k=5,
|
| 311 |
+
temperature=0.6,
|
| 312 |
+
top_k=30,
|
| 313 |
+
top_p=0.95
|
| 314 |
+
)
|
| 315 |
+
print(retrieved_content)
|
| 316 |
+
```
|
| 317 |
+
|
| 318 |
+
|
| 319 |
+
|
| 320 |
+
### 📌 模型规格说明(重复部分)
|
| 321 |
+
|
| 322 |
+
该模型基于一个 24 层的 Transformer 架构,参数配置定义在 `config.json` 中,总参数量约为 10 亿(1.0B)。
|
| 323 |
+
|
| 324 |
+
**关键设计选择:**
|
| 325 |
+
- 在嵌入层(embedding)与最终线性层之间进行权重绑定(weight tying),这是小型模型中常见的节省参数量的做法
|
| 326 |
+
- 嵌入层优化:若不进行权重绑定,一个包含 10,000 个词的词汇表将消耗约 1.02 亿(0.1B)参数
|
| 327 |
|
| 328 |
+
**局限性:**
|
| 329 |
+
- 由于参数规模较小,可能在处理复杂语言现象时表现受限
|
| 330 |
+
- 在特定领域的数据集上容易出现过拟合
|
| 331 |
+
- 多语言能力有限
|
| 332 |
|
| 333 |
+
**优势:**
|
| 334 |
+
- 可在低配置硬件上高效运行
|
| 335 |
+
- 相较于大型模型,训练时间更短
|
| 336 |
|
| 337 |
+
**训练流程:**
|
| 338 |
+
该模型已完成预训练(pre-training)+ 监督微调(SFT, Supervised Fine-Tuning)+ 直接偏好优化(DPO, Direct Preference Optimization)的全流程。所有相关的训练代码均已包含在代码库中。
|