File size: 3,947 Bytes
b2124b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
language:
- zh
- en
base_model:
- Qwen/Qwen2.5-72B
pipeline_tag: text-generation
library_name: transformers
---
## Introduction

The Ming large language model (Ming‑LLM) is a domain‑specialized LLM for the energy sector. 
- We release both the base model and the supervised fine‑tuned (SFT) variant. 
- The Ming base model is initialized from the Qwen2.5‑72B base model and is subsequently adapted via continued pretraining on a high‑quality energy‑domain corpus. 
- The SFT variant is initialized from the Ming base model and is trained on instruction‑tuning datasets, including conversational QA, sentiment analysis, and information extraction, among others.
- Both models demonstrate improved performance across the C‑Eval, CMMLU, MMLU, GSM8K, and IFEval benchmarks.

## Model Parameters
Base model:
- sequence_len: 4096
- gradient_accumulation_steps: 128
- learning_rate: 1.0e-5
- lr_scheduler_type: cosine
- warmup_ratio: 0
- num_train_epochs: 1.0

SFT:
- sequence_len: 4096
- gradient_accumulation_steps: 128
- max learning rate: 2e-6
- max_grad_norm: 1.0
- lr_scheduler_type: cosine
- warmup_ratio: 0.03
- num_train_epochs: 1.0

## Evaluation
| Model                  | c-eval 5-shot | cmmlu 5-shot | mmlu 5-shot | GPQA 0-shot | BBH 0-shot | HellaSwag 10-shot | GSM8K | IFEVAL |
|------------------------|---------------|--------------|-------------|-------------|------------|-------------------|-------|--------|
| qwen2.5-72B-base       | 89.72         | 89.75        | 84.79       | 37.88       | 85.81      | 94.93            | 89.99 | -      |
| ming1.0-base           | 90.11         | 89.84        | 84.97       | 41.92       | 84.80      | 92.73            | 89.23 | -      |
| qwen2.5-72B-instruct  | 87.97         | 87.26        | 84.18       | 36.87       | 83.68      | 92.65            | 89.69 | 82.81  |
| ming1.0              | 90.08         | 89.94        | 85.12       | 37.88       | 85.24      | 94.20            | 91.43 | 78.74  |

## Inference

You can use Ming model with the standard HuggingFace transformers library:
``` python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

dtype = torch.bfloat16
device_map = "auto"

model_path = /model/path
tokenizer = AutoTokenizer.from_pretrained(
    model_path, use_fast=True, trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    model_path, torch_dtype=dtype, device_map=device_map, trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "who are you?"} 
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.3,
        top_p=0.9,
        repetition_penalty=1.1,
        eos_token_id=eos_token_id,  
        pad_token_id=(tokenizer.pad_token_id or tokenizer.eos_token_id),
        streamer=None 
    )
gen_ids = output_ids[0, inputs["input_ids"].shape[1]:]
text = tokenizer.decode(gen_ids, skip_special_tokens=False)
```
## Bias, Risks, and Limitations
- Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. 
- Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. 
- Additionally, many statements from Ming Model or any LLM are often inaccurate, so facts should be verified.

## License and use
- Ming1.0 is built with Qwen-2.5-72B. Qwen-2.5-72B is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
- Subject to the Qwen LICENSE AGREEMENT, Ming1.0 is under MIT license.