--- language: - zh - en base_model: - Qwen/Qwen2.5-72B pipeline_tag: text-generation library_name: transformers --- ## Introduction The Ming large language model (Ming‑LLM) is a domain‑specialized LLM for the energy sector. - We release both the base model and the supervised fine‑tuned (SFT) variant. - The Ming base model is initialized from the Qwen2.5‑72B base model and is subsequently adapted via continued pretraining on a high‑quality energy‑domain corpus. - The SFT variant is initialized from the Ming base model and is trained on instruction‑tuning datasets, including conversational QA, sentiment analysis, and information extraction, among others. - Both models demonstrate improved performance across the C‑Eval, CMMLU, MMLU, GSM8K, and IFEval benchmarks. ## Model Parameters Base model: - sequence_len: 4096 - gradient_accumulation_steps: 128 - learning_rate: 1.0e-5 - lr_scheduler_type: cosine - warmup_ratio: 0 - num_train_epochs: 1.0 SFT: - sequence_len: 4096 - gradient_accumulation_steps: 128 - max learning rate: 2e-6 - max_grad_norm: 1.0 - lr_scheduler_type: cosine - warmup_ratio: 0.03 - num_train_epochs: 1.0 ## Evaluation | Model | c-eval 5-shot | cmmlu 5-shot | mmlu 5-shot | GPQA 0-shot | BBH 0-shot | HellaSwag 10-shot | GSM8K | IFEVAL | |------------------------|---------------|--------------|-------------|-------------|------------|-------------------|-------|--------| | qwen2.5-72B-base | 89.72 | 89.75 | 84.79 | 37.88 | 85.81 | 94.93 | 89.99 | - | | ming1.0-base | 90.11 | 89.84 | 84.97 | 41.92 | 84.80 | 92.73 | 89.23 | - | | qwen2.5-72B-instruct | 87.97 | 87.26 | 84.18 | 36.87 | 83.68 | 92.65 | 89.69 | 82.81 | | ming1.0 | 90.08 | 89.94 | 85.12 | 37.88 | 85.24 | 94.20 | 91.43 | 78.74 | ## Inference You can use Ming model with the standard HuggingFace transformers library: ``` python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer dtype = torch.bfloat16 device_map = "auto" model_path = /model/path tokenizer = AutoTokenizer.from_pretrained( model_path, use_fast=True, trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype=dtype, device_map=device_map, trust_remote_code=True ) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "who are you?"} ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=256, do_sample=True, temperature=0.3, top_p=0.9, repetition_penalty=1.1, eos_token_id=eos_token_id, pad_token_id=(tokenizer.pad_token_id or tokenizer.eos_token_id), streamer=None ) gen_ids = output_ids[0, inputs["input_ids"].shape[1]:] text = tokenizer.decode(gen_ids, skip_special_tokens=False) ``` ## Bias, Risks, and Limitations - Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. - Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. - Additionally, many statements from Ming Model or any LLM are often inaccurate, so facts should be verified. ## License and use - Ming1.0 is built with Qwen-2.5-72B. Qwen-2.5-72B is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved. - Subject to the Qwen LICENSE AGREEMENT, Ming1.0 is under MIT license.