| |
|
| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - zh |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | tags: |
| | - mistral |
| | - qwen2 |
| | --- |
| | This is the Mistral version of [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) model by Alibaba Cloud. |
| | The original codebase can be found at: (https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py). |
| | I have made modifications to make it compatible with qwen2. |
| | This model is converted with https://github.com/Minami-su/character_AI_open/blob/main/mistral_qwen2.py |
| |
|
| | ## special |
| |
|
| | 1.Before using this model, you need to modify modeling_mistral.py in transformers library |
| | |
| | 2.vim /root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/models/mistral/modeling_mistral.py |
| |
|
| | 3.find MistralAttention, |
| |
|
| | 4.modify q,k,v,o bias=False ----->, bias=config.attention_bias |
| | |
| | Before: |
| |  |
| | After: |
| |  |
| | |
| | |
| | ## Differences between qwen2 mistral and qwen2 llamafy |
| | |
| | Compared to qwen2 llamafy,qwen2 mistral can use sliding window attention,qwen2 mistral is faster than qwen2 llamafy, and the context length is better |
| | |
| | |
| | Usage: |
| | |
| | ```python |
| | |
| | from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer |
| | tokenizer = AutoTokenizer.from_pretrained("Minami-su/Qwen2-7B-Instruct-mistral") |
| | model = AutoModelForCausalLM.from_pretrained("Minami-su/Qwen2-7B-Instruct-mistral", torch_dtype="auto", device_map="auto") |
| | streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) |
| |
|
| | messages = [ |
| | {"role": "user", "content": "Who are you?"} |
| | ] |
| | inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") |
| | inputs = inputs.to("cuda") |
| | generate_ids = model.generate(inputs,max_length=2048, streamer=streamer) |
| | |
| | ``` |
| | |