ZhongMingTech
/

Ming1.0

+---
+language:
+- zh
+- en
+base_model:
+- Qwen/Qwen2.5-72B
+pipeline_tag: text-generation
+library_name: transformers
+---
+## Introduction
+The Ming large language model (Ming‑LLM) is a domain‑specialized LLM for the energy sector.
+- We release both the base model and the supervised fine‑tuned (SFT) variant.
+- The Ming base model is initialized from the Qwen2.5‑72B base model and is subsequently adapted via continued pretraining on a high‑quality energy‑domain corpus.
+- The SFT variant is initialized from the Ming base model and is trained on instruction‑tuning datasets, including conversational QA, sentiment analysis, and information extraction, among others.
+- Both models demonstrate improved performance across the C‑Eval, CMMLU, MMLU, GSM8K, and IFEval benchmarks.
+## Model Parameters
+Base model:
+- sequence_len: 4096
+- gradient_accumulation_steps: 128
+- learning_rate: 1.0e-5
+- lr_scheduler_type: cosine
+- warmup_ratio: 0
+- num_train_epochs: 1.0
+SFT:
+- sequence_len: 4096
+- gradient_accumulation_steps: 128
+- max learning rate: 2e-6
+- max_grad_norm: 1.0
+- lr_scheduler_type: cosine
+- warmup_ratio: 0.03
+- num_train_epochs: 1.0
+## Evaluation
+| Model                  | c-eval 5-shot | cmmlu 5-shot | mmlu 5-shot | GPQA 0-shot | BBH 0-shot | HellaSwag 10-shot | GSM8K | IFEVAL |
+|------------------------|---------------|--------------|-------------|-------------|------------|-------------------|-------|--------|
+| qwen2.5-72B-base       | 89.72         | 89.75        | 84.79       | 37.88       | 85.81      | 94.93            | 89.99 | -      |
+| ming1.0-base           | 90.11         | 89.84        | 84.97       | 41.92       | 84.80      | 92.73            | 89.23 | -      |
+| qwen2.5-72B-instruct  | 87.97         | 87.26        | 84.18       | 36.87       | 83.68      | 92.65            | 89.69 | 82.81  |
+| ming1.0              | 90.08         | 89.94        | 85.12       | 37.88       | 85.24      | 94.20            | 91.43 | 78.74  |
+## Inference
+You can use Ming model with the standard HuggingFace transformers library:
+``` python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
+dtype = torch.bfloat16
+device_map = "auto"
+model_path = /model/path
+tokenizer = AutoTokenizer.from_pretrained(
+    model_path, use_fast=True, trust_remote_code=True
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_path, torch_dtype=dtype, device_map=device_map, trust_remote_code=True
+)
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user",   "content": "who are you?"}
+]
+prompt = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    output_ids = model.generate(
+        **inputs,
+        max_new_tokens=256,
+        do_sample=True,
+        temperature=0.3,
+        top_p=0.9,
+        repetition_penalty=1.1,
+        eos_token_id=eos_token_id,
+        pad_token_id=(tokenizer.pad_token_id or tokenizer.eos_token_id),
+        streamer=None
+    )
+gen_ids = output_ids[0, inputs["input_ids"].shape[1]:]
+text = tokenizer.decode(gen_ids, skip_special_tokens=False)
+```
+## Bias, Risks, and Limitations
+- Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content.
+- Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology.
+- Additionally, many statements from Ming Model or any LLM are often inaccurate, so facts should be verified.
+## License and use
+- Ming1.0 is built with Qwen-2.5-72B. Qwen-2.5-72B is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
+- Subject to the Qwen LICENSE AGREEMENT, Ming1.0 is under MIT license.