Henry94
/

Qwen2.5-3B-Character

text-generation-inference

Text Generation

Model card Files Files and versions

Qwen2.5-3B-Character / README.md

Henry94's picture

Update README.md

8297720 verified over 1 year ago

|

history blame contribute delete

3.42 kB

	---
	license: apache-2.0
	datasets:
	- TigerResearch/pretrain_zh
	language:
	- zh
	base_model:
	- Qwen/Qwen2.5-3B
	tags:
	- qwen2.5
	- text-generation-inference
	- Text Generation
	- Character
	---

	Qwen2.5-3B-Character

	Introduction:

	Qwen2.5-3B-Character is the Character version of [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) model. It is developed based on the [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) model. It is specifically designed for character-to-character transformation and generation tasks.

	Core Contributions:

	1. Modified Token Vocabulary: The original model's token vocabulary has been revised to remove tokens representing phrases and multiple characters. This refinement enhances the model's focus on individual character processing.

	2. Continued Pre-training: Based on the modified vocabulary, the model has undergone further pre-training to optimize its performance and adaptability for character-level tasks.


	Training Dataset:

	The model has been trained using the `TigerResearch/pretrain_zh` dataset, a comprehensive Chinese pre-training dataset provided by TigerResearch. For more information about the dataset, please visit: [TigerResearch/pretrain_zh](https://huggingface.co/datasets/TigerResearch/pretrain_zh).


	Training Code:

	The training process for this model was facilitated by the LLaMA-Factory, an open-source project that provides tools and frameworks for training language models. The LLaMa-factory codebase is available at: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).


	Results

	To assess the efficacy of the Qwen2.5-3B-Character, we evaluated its performance on three widely utilized benchmarks: C-Evel, CMMLU, and MMLU. The results are tabulated as follows:

	\| Model \| ceval\| cmmlu\| mmlu\|
	\| :--- \| :---: \| :---: \| :---: \|
	\| Qwen2.5-3B \| 74.37\| 74.94\| 65.87 \|
	\| Qwen2.5-3B-filter \| 70.43\| 69.69\| 65.53 \|
	\| Qwen2.5-3B-Character \| 71.97\| 71.94\| 65.18 \|

	In the table, to discern the model performance more distinctly, we have presented the test results for both the original Qwen2.5-3B (Qwen2.5-3B) and the token-modified Qwen2.5-3B (Qwen2.5-3B-filter).


	Quickstart

	The latest version of transformers is recommended (at least 4.37.0). Here we show a code snippet to show you how to use the chat model with transformers:

	```shell
	from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

	model_name = 'Henry94/Qwen2.5-3B-Character'

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")


	prompt = "请简单介绍一下大型语言模型."
	messages = [
	{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

	print(response)
	```