ZhangCNN
/

MindGLM

@@ -74,11 +74,11 @@ MindGLM was trained using a combination of open-source datasets and self-constru
 5. Training Process
 The model underwent a three-phase training approach:
-Supervised Fine-tuning: Using the ChatGLM2-6B foundational model, MindGLM was fine-tuned with a dedicated dataset for psychological counseling.
-Reward Model Training: A reward model was trained to evaluate and score the responses of the fine-tuned model.
-Reinforcement Learning: The model was further aligned using the PPO (Proximal Policy Optimization) algorithm to ensure its responses align with human preferences.
 6. Limitations
 While MindGLM is a powerful tool, users should be aware of its limitations:
@@ -165,11 +165,11 @@ MindGLM 结合使用开源数据集和自建数据集进行训练，以确保全
 5. 训练过程
 该模型采用了三阶段训练方法：（均使用了LoRA）
-监督微调： 使用 ChatGLM2-6B 基础模型，用心理咨询专用数据集对 MindGLM 进行微调。
-奖励模型训练： 对奖励模型进行训练，以对微调模型的响应进行评估和评分。
-强化学习： 使用 PPO（近端策略优化）算法对模型进行进一步调整，以确保其反应符合人类的偏好。
 6. 局限性
 虽然 MindGLM 是一款功能强大的工具，但用户也应了解其局限性：

 5. Training Process
 The model underwent a three-phase training approach:
+- Supervised Fine-tuning: Using the ChatGLM2-6B foundational model, MindGLM was fine-tuned with a dedicated dataset for psychological counseling.
+- Reward Model Training: A reward model was trained to evaluate and score the responses of the fine-tuned model.
+- Reinforcement Learning: The model was further aligned using the PPO (Proximal Policy Optimization) algorithm to ensure its responses align with human preferences.
 6. Limitations
 While MindGLM is a powerful tool, users should be aware of its limitations:
 5. 训练过程
 该模型采用了三阶段训练方法：（均使用了LoRA）
+- 监督微调： 使用 ChatGLM2-6B 基础模型，用心理咨询专用数据集对 MindGLM 进行微调。
+- 奖励模型训练： 对奖励模型进行训练，以对微调模型的响应进行评估和评分。
+- 强化学习： 使用 PPO（近端策略优化）算法对模型进行进一步调整，以确保其反应符合人类的偏好。
 6. 局限性
 虽然 MindGLM 是一款功能强大的工具，但用户也应了解其局限性：