| # Introduction |
|
|
| [OpenBMB Technical Blog Series](https://openbmb.vercel.app/) |
|
|
| The MiniCPM-MoE-8x2B is a decoder-only transformer-based generative language model. |
|
|
| The MiniCPM-MoE-8x2B adopt a Mixture-of-Experts(MoE) architecture, which has 8 experts per layer and activates 2 of 8 experts for each token. |
|
|
| # Usage |
| This is a model version after instruction tuning but without other rlhf methods. Chat template is automatically applied. |
| ``` python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| torch.manual_seed(0) |
| |
| path = 'openbmb/MiniCPM-MoE-8x2B' |
| tokenizer = AutoTokenizer.from_pretrained(path) |
| model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True) |
| |
| responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?", temperature=0.8, top_p=0.8) |
| print(responds) |
| ``` |
|
|
| # Note |
| 1. You can alse inference with [vLLM](https://github.com/vllm-project/vllm)(>=0.4.1), which is compatible with this repo and has a much higher inference throughput. |
| 2. The precision of model weights in this repo is bfloat16. Manual convertion is needed for other kinds of dtype. |
| 3. For more details, please refer to our [github repo](https://github.com/OpenBMB/MiniCPM). |
|
|
| # Statement |
| 1. As a language model, MiniCPM-MoE-8x2B generates content by learning from a vast amount of text. |
| 2. However, it does not possess the ability to comprehend or express personal opinions or value judgments. |
| 3. Any content generated by MiniCPM-MoE-8x2B does not represent the viewpoints or positions of the model developers. |
| 4. Therefore, when using content generated by MiniCPM-MoE-8x2B, users should take full responsibility for evaluating and verifying it on their own. |
|
|
|
|