swagger00
/

minimind-demo

Model card Files Files and versions

minimind-demo / README.md

swagger00's picture

Upload README.md with huggingface_hub

2584cf0 verified 17 days ago

|

history blame contribute delete

1.73 kB

	---
	language:
	- zh
	tags:
	- MiniMind
	- SFT
	- Chinese
	license: apache-2.0
	---

	# MiniMind2

	这是一个基于 MiniMind 架构训练的demo模型。

	## 模型信息

	- 模型大小: 768维 × 16层 ≈ 104M 参数
	- 训练数据: Pretrian数据(~1.9GB) + SFT数据集 (~7.5GB)
	- 训练轮数: 4 epochs + 2 epochs
	- 最终Loss: ~2.5
	- 训练时长: ~16小时 (4×GPU)

	## 文件说明

	\| 文件 \| 大小 \| 说明 \|
	\|------\|------\|------\|
	\| `pretrain_768.pth` \| ~217MB \| 预训练模型权重 \|
	\| `pretrain_768_resume.pth` \| ~1.0GB \| 训练checkpoint（续训使用） \|
	\| `full_sft_768.pth` \| ~217MB \| 最终模型权重（推理使用） \|
	\| `full_sft_768_resume.pth` \| ~1.0GB \| 训练checkpoint（续训使用） \|


	## 使用方法

	### 1. 下载模型
	```python
	from huggingface_hub import hf_hub_download

	# 下载推理权重
	model_path = hf_hub_download(
	repo_id="swagger00/minimind-demo",
	filename="full_sft_768.pth"
	)

	# 或下载checkpoint（如需续训）
	checkpoint_path = hf_hub_download(
	repo_id="swagger00/minimind-demo",
	filename="full_sft_768_resume.pth"
	)
	```

	### 2. 加载模型
	```python
	import torch
	from model.model_minimind import MiniMind # 需要MiniMind代码

	# 加载模型
	model = MiniMind(...)
	model.load_state_dict(torch.load(model_path))
	model.eval()

	# 推理
	output = model.generate("你好")
	```

	## 训练配置
	```yaml
	模型配置:
	hidden_size: 768
	num_hidden_layers: 16

	训练超参数:
	batch_size: 16
	accumulation_steps: 8
	learning_rate: 1e-5
	epochs: 2
	dtype: bfloat16
	```

	## 项目链接

	- GitHub: https://github.com/edgetalker/minimind_demo
	- 原始项目: [MiniMind](https://github.com/jingyaogong/minimind)

	## License

	Apache 2.0