zyq commited on
Commit
7522ec2
·
1 Parent(s): b764a35

add readme

Browse files
Files changed (2) hide show
  1. README.md +11 -38
  2. README_ZH.md +12 -44
README.md CHANGED
@@ -11,33 +11,29 @@ library_name: transformers
11
  ---
12
 
13
  <div align="center">
14
- <img src="./assets/megrez-logo.png" alt="Megrez Logo" width="400" />
15
 
16
  <br>
17
- <h1> Megrez2-3x7B-A3B-Preview </h1>
18
 
19
- <a href="https://github.com/infinigence/Infini-Megrez">
20
  <b>🔗 Github</b>
21
  </a> &nbsp;|&nbsp;
22
- <a href="https://github.com/infinigence/Infini-Megrez/blob/main/docs/tech_report.pdf">
23
  <b>📄 Tech Report</b>
24
  </a> &nbsp;|&nbsp;
25
- <a href="https://huggingface.co/spaces/Infinigence/Megrez2-3x7B-A3B-Preview">
26
- <b>💻 Demo</b>
27
- </a> &nbsp;|&nbsp;
28
- <a href="https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/assets/wechat-official.jpg">
29
  <b>💬 WeChat Official</b>
30
  </a> &nbsp;
31
 
32
  <br>
33
 
34
- <strong>[中文](https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/README_ZH.md) | English</strong>
35
 
36
  </div>
37
 
38
  ## Introduction
39
 
40
- Megrez2-3x7B-A3B-Preview is a device native large language model. Megrez2 takes advantages of both the accuracy of Mixture-of-Experts (MoE) architecture and the compact size of Dense models. This preview model was trained on 5T Tokens of data. The official release, with larger training data and better reasoning and agent capabilities, will come later this year.
41
 
42
  ## Model Card
43
 
@@ -66,7 +62,7 @@ Megrez2-3x7B-A3B-Preview is a device native large language model. Megrez2 takes
66
 
67
  ## Performance
68
 
69
- We evaluated Megrez2-3x7B-A3B-Preview using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass) on several important benchmarks. Some of the evaluation results are shown in the table below.
70
 
71
  <div align="center">
72
  <table>
@@ -74,7 +70,7 @@ We evaluated Megrez2-3x7B-A3B-Preview using the open-source evaluation tool [Ope
74
  <tr>
75
  <th align="center">Benchmark</th>
76
  <th align="center">Metric</th>
77
- <th align="center"><sup>Megrez2-3x7B<br>-A3B-Preview</sup></th>
78
  <th align="center"><sup>Qwen2.5-3B</sup></th>
79
  <th align="center"><sup>Qwen2.5-7B</sup></th>
80
  <th align="center"><sup>Qwen3-4B</sup></th>
@@ -214,7 +210,7 @@ The following contains a code snippet illustrating how to use the model generate
214
  from transformers import AutoModelForCausalLM, AutoTokenizer
215
  import torch
216
 
217
- path = "Infinigence/Megrez2-3x7B-A3B-Preview"
218
  device = "cuda"
219
 
220
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
@@ -241,21 +237,10 @@ print(responses)
241
  # 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
242
  ```
243
 
244
- ### ModelScope
245
-
246
- `ModelScope` adopts Python API similar to (though not entirely identical to) `Transformers`. For basic usage, simply modify the first line of the above code as follows:
247
-
248
- ```python
249
- from modelscope import AutoModelForCausalLM, AutoTokenizer
250
- ```
251
-
252
- ### llama.cpp
253
-
254
- Coming soon...
255
 
256
  ## How to Deploy
257
 
258
- Megrez2-3x7B-A3B-Preview support using `vLLM` and `SGLang` as inference backends. For more information, please visit the [gitHub repository](https://github.com/infinigence/Infini-Megrez).
259
 
260
  ## Best Practice
261
 
@@ -275,18 +260,6 @@ All our open-weight models are licensed under Apache 2.0.
275
 
276
  If you find our work helpful, feel free to give us a cite.
277
 
278
- ```bibtex
279
- @misc{li2025megrez2technicalreport,
280
- title={Megrez2 Technical Report},
281
- author={Boxun Li and Yadong Li and Zhiyuan Li and Congyi Liu and Weilin Liu and Guowei Niu and Zheyue Tan and Haiyang Xu and Zhuyu Yao and Tao Yuan and Dong Zhou and Yueqing Zhuang and Bo Zhao and Guohao Dai and Yu Wang},
282
- year={2025},
283
- eprint={2507.17728},
284
- archivePrefix={arXiv},
285
- primaryClass={cs.CL},
286
- url={https://arxiv.org/abs/2507.17728},
287
- }
288
- ```
289
-
290
  ## Contact
291
 
292
- If you have any questions, please feel free to submit a GitHub issue or contact [WeChat groups](https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/assets/wechat-group.jpg).
 
11
  ---
12
 
13
  <div align="center">
 
14
 
15
  <br>
16
+ <h1> InnoMegrez2 </h1>
17
 
18
+ <a href="https://github.com/sii-research/InnoMegrez2">
19
  <b>🔗 Github</b>
20
  </a> &nbsp;|&nbsp;
21
+ <a href="https://github.com/sii-research/InnoMegrez2/blob/main/docs/tech_report.pdf">
22
  <b>📄 Tech Report</b>
23
  </a> &nbsp;|&nbsp;
24
+ <a href="https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-official.jpg">
 
 
 
25
  <b>💬 WeChat Official</b>
26
  </a> &nbsp;
27
 
28
  <br>
29
 
30
+ <strong>[中文](https://github.com/sii-research/InnoMegrez2/blob/main/README_ZH.md) | English</strong>
31
 
32
  </div>
33
 
34
  ## Introduction
35
 
36
+ InnoMegrez2 is a device native large language model. Megrez2 takes advantages of both the accuracy of Mixture-of-Experts (MoE) architecture and the compact size of Dense models. This preview model was trained on 5T Tokens of data. The official release, with larger training data and better reasoning and agent capabilities, will come later this year.
37
 
38
  ## Model Card
39
 
 
62
 
63
  ## Performance
64
 
65
+ We evaluated InnoMegrez2 using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass) on several important benchmarks. Some of the evaluation results are shown in the table below.
66
 
67
  <div align="center">
68
  <table>
 
70
  <tr>
71
  <th align="center">Benchmark</th>
72
  <th align="center">Metric</th>
73
+ <th align="center"><sup>InnoMegrez2</th>
74
  <th align="center"><sup>Qwen2.5-3B</sup></th>
75
  <th align="center"><sup>Qwen2.5-7B</sup></th>
76
  <th align="center"><sup>Qwen3-4B</sup></th>
 
210
  from transformers import AutoModelForCausalLM, AutoTokenizer
211
  import torch
212
 
213
+ path = "sii-research/InnoMegrez2"
214
  device = "cuda"
215
 
216
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
 
237
  # 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
238
  ```
239
 
 
 
 
 
 
 
 
 
 
 
 
240
 
241
  ## How to Deploy
242
 
243
+ InnoMegrez2 support using `vLLM` and `SGLang` as inference backends. For more information, please visit the [gitHub repository](https://github.com/sii-research/InnoMegrez2).
244
 
245
  ## Best Practice
246
 
 
260
 
261
  If you find our work helpful, feel free to give us a cite.
262
 
 
 
 
 
 
 
 
 
 
 
 
 
263
  ## Contact
264
 
265
+ If you have any questions, please feel free to submit a GitHub issue or contact [WeChat groups](https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-group.jpg).
README_ZH.md CHANGED
@@ -1,31 +1,26 @@
1
  <div align="center">
2
- <img src="./assets/megrez-logo.png" alt="Megrez Logo" width="400" />
3
 
4
- <br>
5
- <h1> Megrez2-3x7B-A3B-Preview </h1>
6
 
7
- <a href="https://github.com/infinigence/Infini-Megrez">
8
  <b>🔗 Github</b>
9
  </a> &nbsp;|&nbsp;
10
- <a href="https://github.com/infinigence/Infini-Megrez/blob/main/docs/tech_report.pdf">
11
  <b>📄 Tech Report</b>
12
  </a> &nbsp;|&nbsp;
13
- <a href="https://huggingface.co/spaces/Infinigence/Megrez2-3x7B-A3B-Preview">
14
- <b>💻 Demo</b>
15
- </a> &nbsp;|&nbsp;
16
- <a href="https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/assets/wechat-official.jpg">
17
  <b>💬 WeChat Official</b>
18
  </a> &nbsp;
19
 
20
  <br>
21
 
22
- <strong>中文 | [English](https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/README.md)</strong>
23
 
24
  </div>
25
 
26
  ## 模型简介
27
 
28
- Megrez2-3x7B-A3B-Preview 是专为终端设备设计的大模型,兼顾MoE的精度杠杆与Dense的总参数量友好。本次发布的为Megrez 2.0预览版本,训练数据量5T Tokens,未来我们计划完成更大规模的数据训练,并提高模型的推理和Agent能力,正式版本预计今年年内发布。
29
 
30
  ## 基础信息
31
 
@@ -54,7 +49,7 @@ Megrez2-3x7B-A3B-Preview 是专为终端设备设计的大模型,兼顾MoE的
54
 
55
  ## 性能测试
56
 
57
- 我们使用开源评测工具 [OpenCompass](https://github.com/open-compass/opencompass) 对 Megrez2-3x7B-A3B-Preview 进行了评测,部分评测结果如下表所示。
58
 
59
  <div align="center">
60
  <table>
@@ -62,7 +57,7 @@ Megrez2-3x7B-A3B-Preview 是专为终端设备设计的大模型,兼顾MoE的
62
  <tr>
63
  <th align="center">Benchmark</th>
64
  <th align="center">Metric</th>
65
- <th align="center"><sup>Megrez2-3x7B<br>-A3B-Preview</sup></th>
66
  <th align="center"><sup>Qwen2.5-3B</sup></th>
67
  <th align="center"><sup>Qwen2.5-7B</sup></th>
68
  <th align="center"><sup>Qwen3-4B</sup></th>
@@ -196,13 +191,13 @@ Megrez2-3x7B-A3B-Preview 是专为终端设备设计的大模型,兼顾MoE的
196
  ### Transformers
197
 
198
  推荐使用最新版本的 `transformers` 或者 `transformers>=4.52.4` 的版本。
199
- 以下是一个非常简单的代码片段示例,展示如何运行 Megrez2-3x7B-A3B-Preview 模型:
200
 
201
  ```python
202
  from transformers import AutoModelForCausalLM, AutoTokenizer
203
  import torch
204
 
205
- path = "Infinigence/Megrez2-3x7B-A3B-Preview"
206
  device = "cuda"
207
 
208
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
@@ -229,20 +224,9 @@ print(responses)
229
  # 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
230
  ```
231
 
232
- ### ModelScope
233
-
234
- `ModelScope` 采用了与 `Transformers` 类似(但不完全一致)的编程接口。对于基础使用,仅需将上面代码第一行做如下修改:
235
-
236
- ```python
237
- from modelscope import AutoModelForCausalLM, AutoTokenizer
238
- ```
239
-
240
- ### llama.cpp
241
- 即将到来...
242
-
243
  ## 如何部署
244
 
245
- Megrez2-3x7B-A3B-Preview 支持使用 `vLLM` 和 `SGLang` 作为推理后端,更详细的信息请查看我们的[github仓库](https://github.com/infinigence/Infini-Megrez)。
246
 
247
  ## 最佳实践
248
 
@@ -258,23 +242,7 @@ Megrez2-3x7B-A3B-Preview 支持使用 `vLLM` 和 `SGLang` 作为推理后端,
258
 
259
  我们所有的开源模型均采用Apache 2.0协议授权。
260
 
261
- ## 引用信息
262
-
263
- 如果您觉得我们的代码和模型有用,请引用以下信息。
264
-
265
- ```bibtex
266
- @misc{li2025megrez2technicalreport,
267
- title={Megrez2 Technical Report},
268
- author={Boxun Li and Yadong Li and Zhiyuan Li and Congyi Liu and Weilin Liu and Guowei Niu and Zheyue Tan and Haiyang Xu and Zhuyu Yao and Tao Yuan and Dong Zhou and Yueqing Zhuang and Bo Zhao and Guohao Dai and Yu Wang},
269
- year={2025},
270
- eprint={2507.17728},
271
- archivePrefix={arXiv},
272
- primaryClass={cs.CL},
273
- url={https://arxiv.org/abs/2507.17728},
274
- }
275
- ```
276
-
277
 
278
  ## 联系我们
279
 
280
- 如果您有任何问题,请随时提交GitHub issue或联系[微信群组](https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-Preview/blob/main/assets/wechat-group.jpg)。
 
1
  <div align="center">
 
2
 
3
+ <h1> InnoMegrez2 </h1>
 
4
 
5
+ <a href="https://github.com/sii-research/InnoMegrez2">
6
  <b>🔗 Github</b>
7
  </a> &nbsp;|&nbsp;
8
+ <a href="https://github.com/sii-research/InnoMegrez2/blob/main/docs/tech_report.pdf">
9
  <b>📄 Tech Report</b>
10
  </a> &nbsp;|&nbsp;
11
+ <a href="https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-official.jpg">
 
 
 
12
  <b>💬 WeChat Official</b>
13
  </a> &nbsp;
14
 
15
  <br>
16
 
17
+ <strong>中文 | [English](https://github.com/sii-research/InnoMegrez2/blob/main/README.md)</strong>
18
 
19
  </div>
20
 
21
  ## 模型简介
22
 
23
+ InnoMegrez2 是专为终端设备设计的大模型,兼顾MoE的精度杠杆与Dense的总参数量友好。本次发布的为Megrez 2.0预览版本,训练数据量5T Tokens,未来我们计划完成更大规模的数据训练,并提高模型的推理和Agent能力,正式版本预计今年年内发布。
24
 
25
  ## 基础信息
26
 
 
49
 
50
  ## 性能测试
51
 
52
+ 我们使用开源评测工具 [OpenCompass](https://github.com/open-compass/opencompass) 对 InnoMegrez2 进行了评测,部分评测结果如下表所示。
53
 
54
  <div align="center">
55
  <table>
 
57
  <tr>
58
  <th align="center">Benchmark</th>
59
  <th align="center">Metric</th>
60
+ <th align="center"><sup>InnoMegrez2</sup></th>
61
  <th align="center"><sup>Qwen2.5-3B</sup></th>
62
  <th align="center"><sup>Qwen2.5-7B</sup></th>
63
  <th align="center"><sup>Qwen3-4B</sup></th>
 
191
  ### Transformers
192
 
193
  推荐使用最新版本的 `transformers` 或者 `transformers>=4.52.4` 的版本。
194
+ 以下是一个非常简单的代码片段示例,展示如何运行 InnoMegrez2 模型:
195
 
196
  ```python
197
  from transformers import AutoModelForCausalLM, AutoTokenizer
198
  import torch
199
 
200
+ path = "sii-research/InnoMegrez2"
201
  device = "cuda"
202
 
203
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
 
224
  # 世界上最高的山峰是珠穆朗玛峰(Mount Everest),位于喜马拉雅山脉的中尼边境。珠穆朗玛峰的海拔高度为8,848.86米(29,031.7英尺),这一数据是由中国和尼泊尔在2020年共同宣布的最新测量结果。珠穆朗玛峰不仅是登山爱好者的圣地,也是地理和科学研究的重要对象。
225
  ```
226
 
 
 
 
 
 
 
 
 
 
 
 
227
  ## 如何部署
228
 
229
+ InnoMegrez2 支持使用 `vLLM` 和 `SGLang` 作为推理后端,更详细的信息请查看我们的[github仓库](https://github.com/sii-research/InnoMegrez2)。
230
 
231
  ## 最佳实践
232
 
 
242
 
243
  我们所有的开源模型均采用Apache 2.0协议授权。
244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
245
 
246
  ## 联系我们
247
 
248
+ 如果您有任何问题,请随时提交GitHub issue或联系[微信群组](https://github.com/sii-research/InnoMegrez2/blob/main/assets/wechat-group.jpg)。