| | --- |
| | language: |
| | - zh |
| | tags: |
| | - MiniMind |
| | - SFT |
| | - Chinese |
| | license: apache-2.0 |
| | --- |
| | |
| | # MiniMind2 |
| |
|
| | 这是一个基于 MiniMind 架构训练的demo模型。 |
| |
|
| | ## 模型信息 |
| |
|
| | - **模型大小**: 768维 × 16层 ≈ 104M 参数 |
| | - **训练数据**: Pretrian数据(~1.9GB) + SFT数据集 (~7.5GB) |
| | - **训练轮数**: 4 epochs + 2 epochs |
| | - **最终Loss**: ~2.5 |
| | - **训练时长**: ~16小时 (4×GPU) |
| |
|
| | ## 文件说明 |
| |
|
| | | 文件 | 大小 | 说明 | |
| | |------|------|------| |
| | | `pretrain_768.pth` | ~217MB | 预训练模型权重 | |
| | | `pretrain_768_resume.pth` | ~1.0GB | 训练checkpoint(续训使用) | |
| | | `full_sft_768.pth` | ~217MB | 最终模型权重(推理使用) | |
| | | `full_sft_768_resume.pth` | ~1.0GB | 训练checkpoint(续训使用) | |
| |
|
| |
|
| | ## 使用方法 |
| |
|
| | ### 1. 下载模型 |
| | ```python |
| | from huggingface_hub import hf_hub_download |
| | |
| | # 下载推理权重 |
| | model_path = hf_hub_download( |
| | repo_id="swagger00/minimind-demo", |
| | filename="full_sft_768.pth" |
| | ) |
| | |
| | # 或下载checkpoint(如需续训) |
| | checkpoint_path = hf_hub_download( |
| | repo_id="swagger00/minimind-demo", |
| | filename="full_sft_768_resume.pth" |
| | ) |
| | ``` |
| |
|
| | ### 2. 加载模型 |
| | ```python |
| | import torch |
| | from model.model_minimind import MiniMind # 需要MiniMind代码 |
| | |
| | # 加载模型 |
| | model = MiniMind(...) |
| | model.load_state_dict(torch.load(model_path)) |
| | model.eval() |
| | |
| | # 推理 |
| | output = model.generate("你好") |
| | ``` |
| |
|
| | ## 训练配置 |
| | ```yaml |
| | 模型配置: |
| | hidden_size: 768 |
| | num_hidden_layers: 16 |
| | |
| | 训练超参数: |
| | batch_size: 16 |
| | accumulation_steps: 8 |
| | learning_rate: 1e-5 |
| | epochs: 2 |
| | dtype: bfloat16 |
| | ``` |
| |
|
| | ## 项目链接 |
| |
|
| | - GitHub: https://github.com/edgetalker/minimind_demo |
| | - 原始项目: [MiniMind](https://github.com/jingyaogong/minimind) |
| | |
| | ## License |
| | |
| | Apache 2.0 |
| | |