MiniMind2

这是一个基于 MiniMind 架构训练的demo模型。

模型信息

模型大小: 768维 × 16层 ≈ 104M 参数
训练数据: Pretrian数据(~~1.9GB) + SFT数据集 (~~7.5GB)
训练轮数: 4 epochs + 2 epochs
最终Loss: ~2.5
训练时长: ~16小时 (4×GPU)

文件说明

文件	大小	说明
`pretrain_768.pth`	~217MB	预训练模型权重
`pretrain_768_resume.pth`	~1.0GB	训练checkpoint（续训使用）
`full_sft_768.pth`	~217MB	最终模型权重（推理使用）
`full_sft_768_resume.pth`	~1.0GB	训练checkpoint（续训使用）

使用方法

1. 下载模型

from huggingface_hub import hf_hub_download

# 下载推理权重
model_path = hf_hub_download(
    repo_id="swagger00/minimind-demo",
    filename="full_sft_768.pth"
)

# 或下载checkpoint（如需续训）
checkpoint_path = hf_hub_download(
    repo_id="swagger00/minimind-demo",
    filename="full_sft_768_resume.pth"
)

2. 加载模型

import torch
from model.model_minimind import MiniMind  # 需要MiniMind代码

# 加载模型
model = MiniMind(...)
model.load_state_dict(torch.load(model_path))
model.eval()

# 推理
output = model.generate("你好")

训练配置

模型配置:
  hidden_size: 768
  num_hidden_layers: 16
  
训练超参数:
  batch_size: 16
  accumulation_steps: 8
  learning_rate: 1e-5
  epochs: 2
  dtype: bfloat16

项目链接

GitHub: https://github.com/edgetalker/minimind_demo
原始项目: MiniMind

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support