Ming1.0 / README.md

Create README.md

b2124b0 verified 4 months ago

3.95 kB

	---
	language:
	- zh
	- en
	base_model:
	- Qwen/Qwen2.5-72B
	pipeline_tag: text-generation
	library_name: transformers
	---
	## Introduction

	The Ming large language model (Ming‑LLM) is a domain‑specialized LLM for the energy sector.
	- We release both the base model and the supervised fine‑tuned (SFT) variant.
	- The Ming base model is initialized from the Qwen2.5‑72B base model and is subsequently adapted via continued pretraining on a high‑quality energy‑domain corpus.
	- The SFT variant is initialized from the Ming base model and is trained on instruction‑tuning datasets, including conversational QA, sentiment analysis, and information extraction, among others.
	- Both models demonstrate improved performance across the C‑Eval, CMMLU, MMLU, GSM8K, and IFEval benchmarks.

	## Model Parameters
	Base model:
	- sequence_len: 4096
	- gradient_accumulation_steps: 128
	- learning_rate: 1.0e-5
	- lr_scheduler_type: cosine
	- warmup_ratio: 0
	- num_train_epochs: 1.0

	SFT:
	- sequence_len: 4096
	- gradient_accumulation_steps: 128
	- max learning rate: 2e-6
	- max_grad_norm: 1.0
	- lr_scheduler_type: cosine
	- warmup_ratio: 0.03
	- num_train_epochs: 1.0

	## Evaluation
	\| Model \| c-eval 5-shot \| cmmlu 5-shot \| mmlu 5-shot \| GPQA 0-shot \| BBH 0-shot \| HellaSwag 10-shot \| GSM8K \| IFEVAL \|
	\|------------------------\|---------------\|--------------\|-------------\|-------------\|------------\|-------------------\|-------\|--------\|
	\| qwen2.5-72B-base \| 89.72 \| 89.75 \| 84.79 \| 37.88 \| 85.81 \| 94.93 \| 89.99 \| - \|
	\| ming1.0-base \| 90.11 \| 89.84 \| 84.97 \| 41.92 \| 84.80 \| 92.73 \| 89.23 \| - \|
	\| qwen2.5-72B-instruct \| 87.97 \| 87.26 \| 84.18 \| 36.87 \| 83.68 \| 92.65 \| 89.69 \| 82.81 \|
	\| ming1.0 \| 90.08 \| 89.94 \| 85.12 \| 37.88 \| 85.24 \| 94.20 \| 91.43 \| 78.74 \|

	## Inference

	You can use Ming model with the standard HuggingFace transformers library:
	``` python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

	dtype = torch.bfloat16
	device_map = "auto"

	model_path = /model/path
	tokenizer = AutoTokenizer.from_pretrained(
	model_path, use_fast=True, trust_remote_code=True
	)
	model = AutoModelForCausalLM.from_pretrained(
	model_path, torch_dtype=dtype, device_map=device_map, trust_remote_code=True
	)

	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": "who are you?"}
	]

	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	output_ids = model.generate(
	**inputs,
	max_new_tokens=256,
	do_sample=True,
	temperature=0.3,
	top_p=0.9,
	repetition_penalty=1.1,
	eos_token_id=eos_token_id,
	pad_token_id=(tokenizer.pad_token_id or tokenizer.eos_token_id),
	streamer=None
	)
	gen_ids = output_ids[0, inputs["input_ids"].shape[1]:]
	text = tokenizer.decode(gen_ids, skip_special_tokens=False)
	```
	## Bias, Risks, and Limitations
	- Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content.
	- Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology.
	- Additionally, many statements from Ming Model or any LLM are often inaccurate, so facts should be verified.

	## License and use
	- Ming1.0 is built with Qwen-2.5-72B. Qwen-2.5-72B is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
	- Subject to the Qwen LICENSE AGREEMENT, Ming1.0 is under MIT license.