zyq
commited on
Commit
·
7522ec2
1
Parent(s):
b764a35
add readme
Browse files- README.md +11 -38
- README_ZH.md +12 -44
README.md
CHANGED
|
@@ -11,33 +11,29 @@ library_name: transformers
|
|
| 11 |
---
|
| 12 |
|
| 13 |
<div align="center">
|
| 14 |
-
<img src="./assets/megrez-logo.png" alt="Megrez Logo" width="400" />
|
| 15 |
|
| 16 |
<br>
|
| 17 |
-
<h1>
|
| 18 |
|
| 19 |
-
<a href="https://github.com/
|
| 20 |
<b>🔗 Github</b>
|
| 21 |
</a> |
|
| 22 |
-
<a href="https://github.com/
|
| 23 |
<b>📄 Tech Report</b>
|
| 24 |
</a> |
|
| 25 |
-
<a href="https://
|
| 26 |
-
<b>💻 Demo</b>
|
| 27 |
-
</a> |
|
| 28 |
-
<a href="https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/assets/wechat-official.jpg">
|
| 29 |
<b>💬 WeChat Official</b>
|
| 30 |
</a>
|
| 31 |
|
| 32 |
<br>
|
| 33 |
|
| 34 |
-
<strong>[中文](https://
|
| 35 |
|
| 36 |
</div>
|
| 37 |
|
| 38 |
## Introduction
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
## Model Card
|
| 43 |
|
|
@@ -66,7 +62,7 @@ Megrez2-3x7B-A3B-Preview is a device native large language model. Megrez2 takes
|
|
| 66 |
|
| 67 |
## Performance
|
| 68 |
|
| 69 |
-
We evaluated
|
| 70 |
|
| 71 |
<div align="center">
|
| 72 |
<table>
|
|
@@ -74,7 +70,7 @@ We evaluated Megrez2-3x7B-A3B-Preview using the open-source evaluation tool [Ope
|
|
| 74 |
<tr>
|
| 75 |
<th align="center">Benchmark</th>
|
| 76 |
<th align="center">Metric</th>
|
| 77 |
-
<th align="center"><sup>
|
| 78 |
<th align="center"><sup>Qwen2.5-3B</sup></th>
|
| 79 |
<th align="center"><sup>Qwen2.5-7B</sup></th>
|
| 80 |
<th align="center"><sup>Qwen3-4B</sup></th>
|
|
@@ -214,7 +210,7 @@ The following contains a code snippet illustrating how to use the model generate
|
|
| 214 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 215 |
import torch
|
| 216 |
|
| 217 |
-
path = "
|
| 218 |
device = "cuda"
|
| 219 |
|
| 220 |
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
|
|
@@ -241,21 +237,10 @@ print(responses)
|
|
| 241 |
# 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
|
| 242 |
```
|
| 243 |
|
| 244 |
-
### ModelScope
|
| 245 |
-
|
| 246 |
-
`ModelScope` adopts Python API similar to (though not entirely identical to) `Transformers`. For basic usage, simply modify the first line of the above code as follows:
|
| 247 |
-
|
| 248 |
-
```python
|
| 249 |
-
from modelscope import AutoModelForCausalLM, AutoTokenizer
|
| 250 |
-
```
|
| 251 |
-
|
| 252 |
-
### llama.cpp
|
| 253 |
-
|
| 254 |
-
Coming soon...
|
| 255 |
|
| 256 |
## How to Deploy
|
| 257 |
|
| 258 |
-
|
| 259 |
|
| 260 |
## Best Practice
|
| 261 |
|
|
@@ -275,18 +260,6 @@ All our open-weight models are licensed under Apache 2.0.
|
|
| 275 |
|
| 276 |
If you find our work helpful, feel free to give us a cite.
|
| 277 |
|
| 278 |
-
```bibtex
|
| 279 |
-
@misc{li2025megrez2technicalreport,
|
| 280 |
-
title={Megrez2 Technical Report},
|
| 281 |
-
author={Boxun Li and Yadong Li and Zhiyuan Li and Congyi Liu and Weilin Liu and Guowei Niu and Zheyue Tan and Haiyang Xu and Zhuyu Yao and Tao Yuan and Dong Zhou and Yueqing Zhuang and Bo Zhao and Guohao Dai and Yu Wang},
|
| 282 |
-
year={2025},
|
| 283 |
-
eprint={2507.17728},
|
| 284 |
-
archivePrefix={arXiv},
|
| 285 |
-
primaryClass={cs.CL},
|
| 286 |
-
url={https://arxiv.org/abs/2507.17728},
|
| 287 |
-
}
|
| 288 |
-
```
|
| 289 |
-
|
| 290 |
## Contact
|
| 291 |
|
| 292 |
-
If you have any questions, please feel free to submit a GitHub issue or contact [WeChat groups](https://
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
<div align="center">
|
|
|
|
| 14 |
|
| 15 |
<br>
|
| 16 |
+
<h1> InnoMegrez2 </h1>
|
| 17 |
|
| 18 |
+
<a href="https://github.com/sii-research/InnoMegrez2">
|
| 19 |
<b>🔗 Github</b>
|
| 20 |
</a> |
|
| 21 |
+
<a href="https://github.com/sii-research/InnoMegrez2/blob/main/docs/tech_report.pdf">
|
| 22 |
<b>📄 Tech Report</b>
|
| 23 |
</a> |
|
| 24 |
+
<a href="https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-official.jpg">
|
|
|
|
|
|
|
|
|
|
| 25 |
<b>💬 WeChat Official</b>
|
| 26 |
</a>
|
| 27 |
|
| 28 |
<br>
|
| 29 |
|
| 30 |
+
<strong>[中文](https://github.com/sii-research/InnoMegrez2/blob/main/README_ZH.md) | English</strong>
|
| 31 |
|
| 32 |
</div>
|
| 33 |
|
| 34 |
## Introduction
|
| 35 |
|
| 36 |
+
InnoMegrez2 is a device native large language model. Megrez2 takes advantages of both the accuracy of Mixture-of-Experts (MoE) architecture and the compact size of Dense models. This preview model was trained on 5T Tokens of data. The official release, with larger training data and better reasoning and agent capabilities, will come later this year.
|
| 37 |
|
| 38 |
## Model Card
|
| 39 |
|
|
|
|
| 62 |
|
| 63 |
## Performance
|
| 64 |
|
| 65 |
+
We evaluated InnoMegrez2 using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass) on several important benchmarks. Some of the evaluation results are shown in the table below.
|
| 66 |
|
| 67 |
<div align="center">
|
| 68 |
<table>
|
|
|
|
| 70 |
<tr>
|
| 71 |
<th align="center">Benchmark</th>
|
| 72 |
<th align="center">Metric</th>
|
| 73 |
+
<th align="center"><sup>InnoMegrez2</th>
|
| 74 |
<th align="center"><sup>Qwen2.5-3B</sup></th>
|
| 75 |
<th align="center"><sup>Qwen2.5-7B</sup></th>
|
| 76 |
<th align="center"><sup>Qwen3-4B</sup></th>
|
|
|
|
| 210 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 211 |
import torch
|
| 212 |
|
| 213 |
+
path = "sii-research/InnoMegrez2"
|
| 214 |
device = "cuda"
|
| 215 |
|
| 216 |
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
|
|
|
|
| 237 |
# 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
|
| 238 |
```
|
| 239 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 240 |
|
| 241 |
## How to Deploy
|
| 242 |
|
| 243 |
+
InnoMegrez2 support using `vLLM` and `SGLang` as inference backends. For more information, please visit the [gitHub repository](https://github.com/sii-research/InnoMegrez2).
|
| 244 |
|
| 245 |
## Best Practice
|
| 246 |
|
|
|
|
| 260 |
|
| 261 |
If you find our work helpful, feel free to give us a cite.
|
| 262 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
## Contact
|
| 264 |
|
| 265 |
+
If you have any questions, please feel free to submit a GitHub issue or contact [WeChat groups](https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-group.jpg).
|
README_ZH.md
CHANGED
|
@@ -1,31 +1,26 @@
|
|
| 1 |
<div align="center">
|
| 2 |
-
<img src="./assets/megrez-logo.png" alt="Megrez Logo" width="400" />
|
| 3 |
|
| 4 |
-
<
|
| 5 |
-
<h1> Megrez2-3x7B-A3B-Preview </h1>
|
| 6 |
|
| 7 |
-
<a href="https://github.com/
|
| 8 |
<b>🔗 Github</b>
|
| 9 |
</a> |
|
| 10 |
-
<a href="https://github.com/
|
| 11 |
<b>📄 Tech Report</b>
|
| 12 |
</a> |
|
| 13 |
-
<a href="https://
|
| 14 |
-
<b>💻 Demo</b>
|
| 15 |
-
</a> |
|
| 16 |
-
<a href="https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/assets/wechat-official.jpg">
|
| 17 |
<b>💬 WeChat Official</b>
|
| 18 |
</a>
|
| 19 |
|
| 20 |
<br>
|
| 21 |
|
| 22 |
-
<strong>中文 | [English](https://
|
| 23 |
|
| 24 |
</div>
|
| 25 |
|
| 26 |
## 模型简介
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
## 基础信息
|
| 31 |
|
|
@@ -54,7 +49,7 @@ Megrez2-3x7B-A3B-Preview 是专为终端设备设计的大模型,兼顾MoE的
|
|
| 54 |
|
| 55 |
## 性能测试
|
| 56 |
|
| 57 |
-
我们使用开源评测工具 [OpenCompass](https://github.com/open-compass/opencompass) 对
|
| 58 |
|
| 59 |
<div align="center">
|
| 60 |
<table>
|
|
@@ -62,7 +57,7 @@ Megrez2-3x7B-A3B-Preview 是专为终端设备设计的大模型,兼顾MoE的
|
|
| 62 |
<tr>
|
| 63 |
<th align="center">Benchmark</th>
|
| 64 |
<th align="center">Metric</th>
|
| 65 |
-
<th align="center"><sup>
|
| 66 |
<th align="center"><sup>Qwen2.5-3B</sup></th>
|
| 67 |
<th align="center"><sup>Qwen2.5-7B</sup></th>
|
| 68 |
<th align="center"><sup>Qwen3-4B</sup></th>
|
|
@@ -196,13 +191,13 @@ Megrez2-3x7B-A3B-Preview 是专为终端设备设计的大模型,兼顾MoE的
|
|
| 196 |
### Transformers
|
| 197 |
|
| 198 |
推荐使用最新版本的 `transformers` 或者 `transformers>=4.52.4` 的版本。
|
| 199 |
-
以下是一个非常简单的代码片段示例,展示如何运行
|
| 200 |
|
| 201 |
```python
|
| 202 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 203 |
import torch
|
| 204 |
|
| 205 |
-
path = "
|
| 206 |
device = "cuda"
|
| 207 |
|
| 208 |
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
|
|
@@ -229,20 +224,9 @@ print(responses)
|
|
| 229 |
# 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
|
| 230 |
```
|
| 231 |
|
| 232 |
-
### ModelScope
|
| 233 |
-
|
| 234 |
-
`ModelScope` 采用了与 `Transformers` 类似(但不完全一致)的编程接口。对于基础使用,仅需将上面代码第一行做如下修改:
|
| 235 |
-
|
| 236 |
-
```python
|
| 237 |
-
from modelscope import AutoModelForCausalLM, AutoTokenizer
|
| 238 |
-
```
|
| 239 |
-
|
| 240 |
-
### llama.cpp
|
| 241 |
-
即将到来...
|
| 242 |
-
|
| 243 |
## 如何部署
|
| 244 |
|
| 245 |
-
|
| 246 |
|
| 247 |
## 最佳实践
|
| 248 |
|
|
@@ -258,23 +242,7 @@ Megrez2-3x7B-A3B-Preview 支持使用 `vLLM` 和 `SGLang` 作为推理后端,
|
|
| 258 |
|
| 259 |
我们所有的开源模型均采用Apache 2.0协议授权。
|
| 260 |
|
| 261 |
-
## 引用信息
|
| 262 |
-
|
| 263 |
-
如果您觉得我们的代码和模型有用,请引用以下信息。
|
| 264 |
-
|
| 265 |
-
```bibtex
|
| 266 |
-
@misc{li2025megrez2technicalreport,
|
| 267 |
-
title={Megrez2 Technical Report},
|
| 268 |
-
author={Boxun Li and Yadong Li and Zhiyuan Li and Congyi Liu and Weilin Liu and Guowei Niu and Zheyue Tan and Haiyang Xu and Zhuyu Yao and Tao Yuan and Dong Zhou and Yueqing Zhuang and Bo Zhao and Guohao Dai and Yu Wang},
|
| 269 |
-
year={2025},
|
| 270 |
-
eprint={2507.17728},
|
| 271 |
-
archivePrefix={arXiv},
|
| 272 |
-
primaryClass={cs.CL},
|
| 273 |
-
url={https://arxiv.org/abs/2507.17728},
|
| 274 |
-
}
|
| 275 |
-
```
|
| 276 |
-
|
| 277 |
|
| 278 |
## 联系我们
|
| 279 |
|
| 280 |
-
如果您有任何问题,请随时提交GitHub issue或联系[微信群组](https://
|
|
|
|
| 1 |
<div align="center">
|
|
|
|
| 2 |
|
| 3 |
+
<h1> InnoMegrez2 </h1>
|
|
|
|
| 4 |
|
| 5 |
+
<a href="https://github.com/sii-research/InnoMegrez2">
|
| 6 |
<b>🔗 Github</b>
|
| 7 |
</a> |
|
| 8 |
+
<a href="https://github.com/sii-research/InnoMegrez2/blob/main/docs/tech_report.pdf">
|
| 9 |
<b>📄 Tech Report</b>
|
| 10 |
</a> |
|
| 11 |
+
<a href="https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-official.jpg">
|
|
|
|
|
|
|
|
|
|
| 12 |
<b>💬 WeChat Official</b>
|
| 13 |
</a>
|
| 14 |
|
| 15 |
<br>
|
| 16 |
|
| 17 |
+
<strong>中文 | [English](https://github.com/sii-research/InnoMegrez2/blob/main/README.md)</strong>
|
| 18 |
|
| 19 |
</div>
|
| 20 |
|
| 21 |
## 模型简介
|
| 22 |
|
| 23 |
+
InnoMegrez2 是专为终端设备设计的大模型,兼顾MoE的精度杠杆与Dense的总参数量友好。本次发布的为Megrez 2.0预览版本,训练数据量5T Tokens,未来我们计划完成更大规模的数据训练,并提高模型的推理和Agent能力,正式版本预计今年年内发布。
|
| 24 |
|
| 25 |
## 基础信息
|
| 26 |
|
|
|
|
| 49 |
|
| 50 |
## 性能测试
|
| 51 |
|
| 52 |
+
我们使用开源评测工具 [OpenCompass](https://github.com/open-compass/opencompass) 对 InnoMegrez2 进行了评测,部分评测结果如下表所示。
|
| 53 |
|
| 54 |
<div align="center">
|
| 55 |
<table>
|
|
|
|
| 57 |
<tr>
|
| 58 |
<th align="center">Benchmark</th>
|
| 59 |
<th align="center">Metric</th>
|
| 60 |
+
<th align="center"><sup>InnoMegrez2</sup></th>
|
| 61 |
<th align="center"><sup>Qwen2.5-3B</sup></th>
|
| 62 |
<th align="center"><sup>Qwen2.5-7B</sup></th>
|
| 63 |
<th align="center"><sup>Qwen3-4B</sup></th>
|
|
|
|
| 191 |
### Transformers
|
| 192 |
|
| 193 |
推荐使用最新版本的 `transformers` 或者 `transformers>=4.52.4` 的版本。
|
| 194 |
+
以下是一个非常简单的代码片段示例,展示如何运行 InnoMegrez2 模型:
|
| 195 |
|
| 196 |
```python
|
| 197 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 198 |
import torch
|
| 199 |
|
| 200 |
+
path = "sii-research/InnoMegrez2"
|
| 201 |
device = "cuda"
|
| 202 |
|
| 203 |
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
|
|
|
|
| 224 |
# 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
|
| 225 |
```
|
| 226 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 227 |
## 如何部署
|
| 228 |
|
| 229 |
+
InnoMegrez2 支持使用 `vLLM` 和 `SGLang` 作为推理后端,更详细的信息请查看我们的[github仓库](https://github.com/sii-research/InnoMegrez2)。
|
| 230 |
|
| 231 |
## 最佳实践
|
| 232 |
|
|
|
|
| 242 |
|
| 243 |
我们所有的开源模型均采用Apache 2.0协议授权。
|
| 244 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 245 |
|
| 246 |
## 联系我们
|
| 247 |
|
| 248 |
+
如果您有任何问题,请随时提交GitHub issue或联系[微信群组](https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-group.jpg)。
|