|
|
--- |
|
|
language: |
|
|
- zh |
|
|
- en |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-72B |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
## Introduction |
|
|
|
|
|
The Ming large language model (Ming‑LLM) is a domain‑specialized LLM for the energy sector. |
|
|
- We release both the base model and the supervised fine‑tuned (SFT) variant. |
|
|
- The Ming base model is initialized from the Qwen2.5‑72B base model and is subsequently adapted via continued pretraining on a high‑quality energy‑domain corpus. |
|
|
- The SFT variant is initialized from the Ming base model and is trained on instruction‑tuning datasets, including conversational QA, sentiment analysis, and information extraction, among others. |
|
|
- Both models demonstrate improved performance across the C‑Eval, CMMLU, MMLU, GSM8K, and IFEval benchmarks. |
|
|
|
|
|
## Model Parameters |
|
|
Base model: |
|
|
- sequence_len: 4096 |
|
|
- gradient_accumulation_steps: 128 |
|
|
- learning_rate: 1.0e-5 |
|
|
- lr_scheduler_type: cosine |
|
|
- warmup_ratio: 0 |
|
|
- num_train_epochs: 1.0 |
|
|
|
|
|
SFT: |
|
|
- sequence_len: 4096 |
|
|
- gradient_accumulation_steps: 128 |
|
|
- max learning rate: 2e-6 |
|
|
- max_grad_norm: 1.0 |
|
|
- lr_scheduler_type: cosine |
|
|
- warmup_ratio: 0.03 |
|
|
- num_train_epochs: 1.0 |
|
|
|
|
|
## Evaluation |
|
|
| Model | c-eval 5-shot | cmmlu 5-shot | mmlu 5-shot | GPQA 0-shot | BBH 0-shot | HellaSwag 10-shot | GSM8K | IFEVAL | |
|
|
|------------------------|---------------|--------------|-------------|-------------|------------|-------------------|-------|--------| |
|
|
| qwen2.5-72B-base | 89.72 | 89.75 | 84.79 | 37.88 | 85.81 | 94.93 | 89.99 | - | |
|
|
| ming1.0-base | 90.11 | 89.84 | 84.97 | 41.92 | 84.80 | 92.73 | 89.23 | - | |
|
|
| qwen2.5-72B-instruct | 87.97 | 87.26 | 84.18 | 36.87 | 83.68 | 92.65 | 89.69 | 82.81 | |
|
|
| ming1.0 | 90.08 | 89.94 | 85.12 | 37.88 | 85.24 | 94.20 | 91.43 | 78.74 | |
|
|
|
|
|
## Inference |
|
|
|
|
|
You can use Ming model with the standard HuggingFace transformers library: |
|
|
``` python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer |
|
|
|
|
|
dtype = torch.bfloat16 |
|
|
device_map = "auto" |
|
|
|
|
|
model_path = /model/path |
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
model_path, use_fast=True, trust_remote_code=True |
|
|
) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_path, torch_dtype=dtype, device_map=device_map, trust_remote_code=True |
|
|
) |
|
|
|
|
|
messages = [ |
|
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
|
{"role": "user", "content": "who are you?"} |
|
|
] |
|
|
|
|
|
prompt = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
output_ids = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=256, |
|
|
do_sample=True, |
|
|
temperature=0.3, |
|
|
top_p=0.9, |
|
|
repetition_penalty=1.1, |
|
|
eos_token_id=eos_token_id, |
|
|
pad_token_id=(tokenizer.pad_token_id or tokenizer.eos_token_id), |
|
|
streamer=None |
|
|
) |
|
|
gen_ids = output_ids[0, inputs["input_ids"].shape[1]:] |
|
|
text = tokenizer.decode(gen_ids, skip_special_tokens=False) |
|
|
``` |
|
|
## Bias, Risks, and Limitations |
|
|
- Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. |
|
|
- Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. |
|
|
- Additionally, many statements from Ming Model or any LLM are often inaccurate, so facts should be verified. |
|
|
|
|
|
## License and use |
|
|
- Ming1.0 is built with Qwen-2.5-72B. Qwen-2.5-72B is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved. |
|
|
- Subject to the Qwen LICENSE AGREEMENT, Ming1.0 is under MIT license. |
|
|
|
|
|
|