ZhouAlen commited on
Commit
b2124b0
·
verified ·
1 Parent(s): a3686f8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-72B
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ ---
10
+ ## Introduction
11
+
12
+ The Ming large language model (Ming‑LLM) is a domain‑specialized LLM for the energy sector.
13
+ - We release both the base model and the supervised fine‑tuned (SFT) variant.
14
+ - The Ming base model is initialized from the Qwen2.5‑72B base model and is subsequently adapted via continued pretraining on a high‑quality energy‑domain corpus.
15
+ - The SFT variant is initialized from the Ming base model and is trained on instruction‑tuning datasets, including conversational QA, sentiment analysis, and information extraction, among others.
16
+ - Both models demonstrate improved performance across the C‑Eval, CMMLU, MMLU, GSM8K, and IFEval benchmarks.
17
+
18
+ ## Model Parameters
19
+ Base model:
20
+ - sequence_len: 4096
21
+ - gradient_accumulation_steps: 128
22
+ - learning_rate: 1.0e-5
23
+ - lr_scheduler_type: cosine
24
+ - warmup_ratio: 0
25
+ - num_train_epochs: 1.0
26
+
27
+ SFT:
28
+ - sequence_len: 4096
29
+ - gradient_accumulation_steps: 128
30
+ - max learning rate: 2e-6
31
+ - max_grad_norm: 1.0
32
+ - lr_scheduler_type: cosine
33
+ - warmup_ratio: 0.03
34
+ - num_train_epochs: 1.0
35
+
36
+ ## Evaluation
37
+ | Model | c-eval 5-shot | cmmlu 5-shot | mmlu 5-shot | GPQA 0-shot | BBH 0-shot | HellaSwag 10-shot | GSM8K | IFEVAL |
38
+ |------------------------|---------------|--------------|-------------|-------------|------------|-------------------|-------|--------|
39
+ | qwen2.5-72B-base | 89.72 | 89.75 | 84.79 | 37.88 | 85.81 | 94.93 | 89.99 | - |
40
+ | ming1.0-base | 90.11 | 89.84 | 84.97 | 41.92 | 84.80 | 92.73 | 89.23 | - |
41
+ | qwen2.5-72B-instruct | 87.97 | 87.26 | 84.18 | 36.87 | 83.68 | 92.65 | 89.69 | 82.81 |
42
+ | ming1.0 | 90.08 | 89.94 | 85.12 | 37.88 | 85.24 | 94.20 | 91.43 | 78.74 |
43
+
44
+ ## Inference
45
+
46
+ You can use Ming model with the standard HuggingFace transformers library:
47
+ ``` python
48
+ import torch
49
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
50
+
51
+ dtype = torch.bfloat16
52
+ device_map = "auto"
53
+
54
+ model_path = /model/path
55
+ tokenizer = AutoTokenizer.from_pretrained(
56
+ model_path, use_fast=True, trust_remote_code=True
57
+ )
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ model_path, torch_dtype=dtype, device_map=device_map, trust_remote_code=True
60
+ )
61
+
62
+ messages = [
63
+ {"role": "system", "content": "You are a helpful assistant."},
64
+ {"role": "user", "content": "who are you?"}
65
+ ]
66
+
67
+ prompt = tokenizer.apply_chat_template(
68
+ messages,
69
+ tokenize=False,
70
+ add_generation_prompt=True
71
+ )
72
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
73
+
74
+ with torch.no_grad():
75
+ output_ids = model.generate(
76
+ **inputs,
77
+ max_new_tokens=256,
78
+ do_sample=True,
79
+ temperature=0.3,
80
+ top_p=0.9,
81
+ repetition_penalty=1.1,
82
+ eos_token_id=eos_token_id,
83
+ pad_token_id=(tokenizer.pad_token_id or tokenizer.eos_token_id),
84
+ streamer=None
85
+ )
86
+ gen_ids = output_ids[0, inputs["input_ids"].shape[1]:]
87
+ text = tokenizer.decode(gen_ids, skip_special_tokens=False)
88
+ ```
89
+ ## Bias, Risks, and Limitations
90
+ - Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content.
91
+ - Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology.
92
+ - Additionally, many statements from Ming Model or any LLM are often inaccurate, so facts should be verified.
93
+
94
+ ## License and use
95
+ - Ming1.0 is built with Qwen-2.5-72B. Qwen-2.5-72B is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
96
+ - Subject to the Qwen LICENSE AGREEMENT, Ming1.0 is under MIT license.
97
+