| | --- |
| | library_name: transformers |
| | datasets: |
| | - shareAI/ShareGPT-Chinese-English-90k |
| | - FreedomIntelligence/ShareGPT-CN |
| | language: |
| | - zh |
| | pipeline_tag: question-answering |
| | tags: |
| | - chat |
| | - llm |
| | - llama2 |
| | - chatgpt |
| | --- |
| | - Github:https://github.com/CrazyBoyM/llama2-Chinese-chat |
| | |
| | 更新: |
| | - 2023-7-19 首个llama2 13b中文对话版本放出。 |
| | - 2023-07-23 完成第2个epoch训练放出,测试有更好的对话体验 |
| | - 2023-08-03 分支版本:bimoGPT放出,拥有自我身份认知、不错的代码问答能力,下载地址:https://huggingface.co/shareAI/bimoGPT-llama2-13b |
| | - 2023-08-21 更新世界模型排名榜,超越某号称“中文Llama2官方”社区的收费模型十多个名次。 |
| |
|
| |
|
| |
|
| | 完整合并后文件下载:https://www.codewithgpu.com/m/file/llama2-13b-Chinese-chat |
| |
|
| | - 训练用数据集:https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k |
| | - llama2训练交流QQ群:443064756 |
| | |
| | 项目在中文sharegpt数据集上训练得到的llama2 Chinese chat 13b,为减轻文件大小负担这里只放出了adapter的权重 |
| | 请拉取https://huggingface.co/TheBloke/Llama-2-13B-fp16 作为基础权重,使用如下脚步执行合并得到可工作的总权重: |
| |
|
| | ```python |
| | from peft import PeftModel |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | model_name_or_path = '/data/TheBloke/Llama-2-13B-fp16' |
| | adapter_name_or_path = '/data/llama2-13b-Chinese-chat' |
| | save_path = '/data/llama2-13b-Chinese-chat_v1' |
| | |
| | tokenizer = AutoTokenizer.from_pretrained( |
| | model_name_or_path, |
| | trust_remote_code=True |
| | ) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name_or_path, |
| | trust_remote_code=True, |
| | low_cpu_mem_usage=True, |
| | torch_dtype=torch.float16, |
| | device_map='auto' |
| | ) |
| | print("load model success") |
| | model = PeftModel.from_pretrained(model, adapter_name_or_path) |
| | print("load adapter success") |
| | model = model.merge_and_unload() |
| | print("merge success") |
| | |
| | tokenizer.save_pretrained(save_path) |
| | model.save_pretrained(save_path) |
| | print("save done.") |
| | ``` |
| | 合并后,体验对话: |
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | |
| | def main(): |
| | model_name = '/data/llama2-13b-Chinese-chat_v1' |
| | |
| | device = 'cuda' |
| | max_new_tokens = 500 # 每轮对话最多生成多少个token |
| | history_max_len = 2000 # 模型记忆的最大token长度 |
| | top_p = 0.9 |
| | temperature = 0.35 # 越大模型越浪 |
| | repetition_penalty = 1.2 # 如果模型出现重复说话可以调节该系数 |
| | |
| | # 加载模型 |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name, |
| | trust_remote_code=True, |
| | low_cpu_mem_usage=True, |
| | torch_dtype=torch.float16, |
| | device_map='auto' |
| | ).to(device).eval() |
| | tokenizer = AutoTokenizer.from_pretrained( |
| | model_name, |
| | trust_remote_code=True, |
| | # llama不支持fast |
| | use_fast=False if model.config.model_type == 'llama' else True |
| | ) |
| | # 记录所有历史记录 |
| | history_token_ids = tokenizer('<s>', return_tensors="pt").input_ids |
| | |
| | # 开始对话 |
| | user_input = input('User:') |
| | while True: |
| | user_input = '{}</s>'.format(user_input) |
| | user_input_ids = tokenizer(user_input, return_tensors="pt", add_special_tokens=False).input_ids |
| | history_token_ids = torch.concat((history_token_ids, user_input_ids), dim=1) |
| | model_input_ids = history_token_ids[:, -history_max_len:].to(device) |
| | with torch.no_grad(): |
| | outputs = model.generate( |
| | input_ids=model_input_ids, max_new_tokens=max_new_tokens, do_sample=True, top_p=top_p, |
| | temperature=temperature, repetition_penalty=repetition_penalty, eos_token_id=tokenizer.eos_token_id |
| | ) |
| | model_input_ids_len = model_input_ids.size(1) |
| | response_ids = outputs[:, model_input_ids_len:] |
| | history_token_ids = torch.concat((history_token_ids, response_ids.cpu()), dim=1) |
| | response = tokenizer.batch_decode(response_ids) |
| | print("Bot:" + response[0].strip().replace('</s>', "")) |
| | user_input = input('User:') |
| | |
| | |
| | if __name__ == '__main__': |
| | main() |
| | |
| | ``` |
| | 推荐继续二次训练以针对性调优对话效果~ |
| | ## Training procedure |
| |
|
| |
|
| | The following `bitsandbytes` quantization config was used during training: |
| | - load_in_8bit: False |
| | - load_in_4bit: True |
| | - llm_int8_threshold: 6.0 |
| | - llm_int8_skip_modules: None |
| | - llm_int8_enable_fp32_cpu_offload: False |
| | - llm_int8_has_fp16_weight: False |
| | - bnb_4bit_quant_type: nf4 |
| | - bnb_4bit_use_double_quant: True |
| | - bnb_4bit_compute_dtype: float16 |
| | ### Framework versions |
| |
|
| |
|
| | - PEFT 0.4.0.dev0 |
| | 训练1个epoch,loss 0.9,实测用中文对话体验优于baichuan13b(仅主观感受)。还有很大潜力,建议作为底座把文件拉回去继续调优。 |
| |
|
| | 感谢: |
| | - LLaMA2 |
| | - Firefly项目 |
| | - shareGPT中文数据集的建设者们 |