| --- |
| license: apache-2.0 |
| datasets: |
| - TigerResearch/pretrain_zh |
| language: |
| - zh |
| base_model: |
| - Qwen/Qwen2.5-3B |
| tags: |
| - qwen2.5 |
| - text-generation-inference |
| - Text Generation |
| - Character |
| --- |
| |
| **Qwen2.5-3B-Character** |
|
|
| **Introduction:** |
|
|
| **Qwen2.5-3B-Character** is the Character version of [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) model. It is developed based on the [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) model. It is specifically designed for character-to-character transformation and generation tasks. |
|
|
| **Core Contributions:** |
|
|
| 1. **Modified Token Vocabulary:** The original model's token vocabulary has been revised to remove tokens representing phrases and multiple characters. This refinement enhances the model's focus on individual character processing. |
|
|
| 2. **Continued Pre-training:** Based on the modified vocabulary, the model has undergone further pre-training to optimize its performance and adaptability for character-level tasks. |
| |
|
|
| **Training Dataset:** |
|
|
| The model has been trained using the `TigerResearch/pretrain_zh` dataset, a comprehensive Chinese pre-training dataset provided by **TigerResearch**. For more information about the dataset, please visit: [TigerResearch/pretrain_zh](https://huggingface.co/datasets/TigerResearch/pretrain_zh). |
|
|
|
|
| **Training Code:** |
|
|
| The training process for this model was facilitated by the **LLaMA-Factory**, an open-source project that provides tools and frameworks for training language models. The LLaMa-factory codebase is available at: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). |
|
|
|
|
| **Results** |
|
|
| To assess the efficacy of the Qwen2.5-3B-Character, we evaluated its performance on three widely utilized benchmarks: C-Evel, CMMLU, and MMLU. The results are tabulated as follows: |
|
|
| | Model | ceval| cmmlu| mmlu| |
| | :--- | :---: | :---: | :---: | |
| | Qwen2.5-3B | 74.37| 74.94| 65.87 | |
| | Qwen2.5-3B-filter | 70.43| 69.69| 65.53 | |
| | Qwen2.5-3B-Character | 71.97| 71.94| 65.18 | |
|
|
| In the table, to discern the model performance more distinctly, we have presented the test results for both the original Qwen2.5-3B (Qwen2.5-3B) and the token-modified Qwen2.5-3B (Qwen2.5-3B-filter). |
|
|
|
|
| **Quickstart** |
|
|
| The latest version of transformers is recommended (at least 4.37.0). Here we show a code snippet to show you how to use the chat model with transformers: |
|
|
| ```shell |
| from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer |
| |
| model_name = 'Henry94/Qwen2.5-3B-Character' |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") |
| |
| |
| prompt = "请简单介绍一下大型语言模型." |
| messages = [ |
| {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, |
| {"role": "user", "content": prompt} |
| ] |
| text = tokenizer.apply_chat_template( |
| messages, |
| tokenize=False, |
| add_generation_prompt=True |
| ) |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| |
| generated_ids = model.generate( |
| **model_inputs, |
| max_new_tokens=512 |
| ) |
| generated_ids = [ |
| output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
| ] |
| |
| response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| |
| print(response) |
| ``` |