| <!--Copyright 2023 Mistral AI and The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See thze License for the | |
| specific language governing permissions and limitations under the License. | |
| β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # Mistral[[mistral]] | |
| ## κ°μ[[overview]] | |
| λ―Έμ€νΈλμ Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, LΓ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, TimothΓ©e Lacroix, William El Sayedκ° μμ±ν [μ΄ λΈλ‘κ·Έ ν¬μ€νΈ](https://mistral.ai/news/announcing-mistral-7b/)μμ μκ°λμμ΅λλ€. | |
| λΈλ‘κ·Έ ν¬μ€νΈμ μλλ λ€μκ³Ό κ°μ΅λλ€: | |
| *λ―Έμ€νΈλ AIνμ νμ‘΄νλ μΈμ΄ λͺ¨λΈ μ€ ν¬κΈ° λλΉ κ°μ₯ κ°λ ₯ν λ―Έμ€νΈλ7Bλ₯Ό μΆμνκ² λμ΄ μλμ€λ½μ΅λλ€.* | |
| λ―Έμ€νΈλ-7Bλ [mistral.ai](https://mistral.ai/)μμ μΆμν 첫 λ²μ§Έ λκ·λͺ¨ μΈμ΄ λͺ¨λΈ(LLM)μ λλ€. | |
| ### μν€ν μ² μΈλΆμ¬ν[[architectural-details]] | |
| λ―Έμ€νΈλ-7Bλ λ€μκ³Ό κ°μ ꡬ쑰μ νΉμ§μ κ°μ§ λμ½λ μ μ© νΈλμ€ν¬λ¨Έμ λλ€: | |
| - μ¬λΌμ΄λ© μλμ° μ΄ν μ : 8k 컨ν μ€νΈ κΈΈμ΄μ κ³ μ μΊμ ν¬κΈ°λ‘ νλ ¨λμμΌλ©°, μ΄λ‘ μ 128K ν ν°μ μ΄ν μ λ²μλ₯Ό κ°μ§λλ€. | |
| - GQA(Grouped Query Attention): λ λΉ λ₯Έ μΆλ‘ μ΄ κ°λ₯νκ³ λ μμ ν¬κΈ°μ μΊμλ₯Ό μ¬μ©ν©λλ€. | |
| - λ°μ΄νΈ ν΄λ°±(Byte-fallback) BPE ν ν¬λμ΄μ : λ¬Έμλ€μ΄ μ λ μ΄ν λͺ©λ‘ μΈμ ν ν°μΌλ‘ λ§€νλμ§ μλλ‘ λ³΄μ₯ν©λλ€. | |
| λ μμΈν λ΄μ©μ [μΆμ λΈλ‘κ·Έ ν¬μ€νΈ](https://mistral.ai/news/announcing-mistral-7b/)λ₯Ό μ°Έμ‘°νμΈμ. | |
| ### λΌμ΄μ μ€[[license]] | |
| `λ―Έμ€νΈλ-7B`λ μνμΉ 2.0 λΌμ΄μ μ€λ‘ μΆμλμμ΅λλ€. | |
| ## μ¬μ© ν[[usage-tips]] | |
| λ―Έμ€νΈλ AIνμ λ€μ 3κ°μ§ 체ν¬ν¬μΈνΈλ₯Ό 곡κ°νμ΅λλ€: | |
| - κΈ°λ³Έ λͺ¨λΈμΈ [λ―Έμ€νΈλ-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)μ μΈν°λ· κ·λͺ¨μ λ°μ΄ν°μμ λ€μ ν ν°μ μμΈ‘νλλ‘ μ¬μ νλ ¨λμμ΅λλ€. | |
| - μ§μ μ‘°μ λͺ¨λΈμΈ [λ―Έμ€νΈλ-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)μ μ§λ λ―ΈμΈ μ‘°μ (SFT)κ³Ό μ§μ μ νΈλ μ΅μ ν(DPO)λ₯Ό μ¬μ©ν μ±ν μ μ΅μ νλ κΈ°λ³Έ λͺ¨λΈμ λλ€. | |
| - κ°μ λ μ§μ μ‘°μ λͺ¨λΈμΈ [λ―Έμ€νΈλ-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)λ v1μ κ°μ ν λ²μ μ λλ€. | |
| κΈ°λ³Έ λͺ¨λΈμ λ€μκ³Ό κ°μ΄ μ¬μ©ν μ μμ΅λλ€: | |
| ```python | |
| >>> from transformers import AutoModelForCausalLM, AutoTokenizer | |
| >>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto") | |
| >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1") | |
| >>> prompt = "My favourite condiment is" | |
| >>> model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda") | |
| >>> model.to(device) | |
| >>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True) | |
| >>> tokenizer.batch_decode(generated_ids)[0] | |
| "My favourite condiment is to ..." | |
| ``` | |
| μ§μ μ‘°μ λͺ¨λΈμ λ€μκ³Ό κ°μ΄ μ¬μ©ν μ μμ΅λλ€: | |
| ```python | |
| >>> from transformers import AutoModelForCausalLM, AutoTokenizer | |
| >>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", device_map="auto") | |
| >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") | |
| >>> messages = [ | |
| ... {"role": "user", "content": "What is your favourite condiment?"}, | |
| ... {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, | |
| ... {"role": "user", "content": "Do you have mayonnaise recipes?"} | |
| ... ] | |
| >>> model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") | |
| >>> generated_ids = model.generate(model_inputs, max_new_tokens=100, do_sample=True) | |
| >>> tokenizer.batch_decode(generated_ids)[0] | |
| "Mayonnaise can be made as follows: (...)" | |
| ``` | |
| μ§μ μ‘°μ λͺ¨λΈμ μ λ ₯μ΄ μ¬λ°λ₯Έ νμμΌλ‘ μ€λΉλλλ‘ [μ±ν ν νλ¦Ώ](../chat_templating)μ μ μ©ν΄μΌ ν©λλ€. | |
| ## νλμ μ΄ν μ μ μ΄μ©ν λ―Έμ€νΈλ μλν₯μ[[speeding-up-mistral-by-using-flash-attention]] | |
| μμ μ½λ μ€λν«λ€μ μ΄λ€ μ΅μ ν κΈ°λ²λ μ¬μ©νμ§ μμ μΆλ‘ κ³Όμ μ 보μ¬μ€λλ€. νμ§λ§ λͺ¨λΈ λ΄λΆμμ μ¬μ©λλ μ΄ν μ λ©μ»€λμ¦μ λ λΉ λ₯Έ ꡬνμΈ [νλμ μ΄ν μ 2](../perf_train_gpu_one.md#flash-attention-2)μ νμ©νλ©΄ λͺ¨λΈμ μλλ₯Ό ν¬κ² λμΌ μ μμ΅λλ€. | |
| λ¨Όμ , μ¬λΌμ΄λ© μλμ° μ΄ν μ κΈ°λ₯μ ν¬ν¨νλ νλμ μ΄ν μ 2μ μ΅μ λ²μ μ μ€μΉν΄μΌ ν©λλ€. | |
| ```bash | |
| pip install -U flash-attn --no-build-isolation | |
| ``` | |
| νλμ¨μ΄μ νλμ μ΄ν μ 2μ νΈνμ¬λΆλ₯Ό νμΈνμΈμ. μ΄μ λν μμΈν λ΄μ©μ [νλμ μ΄ν μ μ μ₯μ](https://github.com/Dao-AILab/flash-attention)μ 곡μ λ¬Έμμμ νμΈν μ μμ΅λλ€. λν λͺ¨λΈμ λ°μ λ°λ(μ: `torch.float16`)λ‘ λΆλ¬μμΌν©λλ€. | |
| νλμ μ΄ν μ 2λ₯Ό μ¬μ©νμ¬ λͺ¨λΈμ λΆλ¬μ€κ³ μ€ννλ €λ©΄ μλ μ½λ μ€λν«μ μ°Έμ‘°νμΈμ: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoModelForCausalLM, AutoTokenizer | |
| >>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", torch_dtype=torch.float16, attn_implementation="flash_attention_2", device_map="auto") | |
| >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1") | |
| >>> prompt = "My favourite condiment is" | |
| >>> model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda") | |
| >>> model.to(device) | |
| >>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True) | |
| >>> tokenizer.batch_decode(generated_ids)[0] | |
| "My favourite condiment is to (...)" | |
| ``` | |
| ### κΈ°λνλ μλ ν₯μ[[expected-speedups]] | |
| λ€μμ `mistralai/Mistral-7B-v0.1` 체ν¬ν¬μΈνΈλ₯Ό μ¬μ©ν νΈλμ€ν¬λ¨Έμ κΈ°λ³Έ ꡬνκ³Ό νλμ μ΄ν μ 2 λ²μ λͺ¨λΈ μ¬μ΄μ μμ μΆλ‘ μκ°μ λΉκ΅ν μμ μλ ν₯μ λ€μ΄μ΄κ·Έλ¨μ λλ€. | |
| <div style="text-align: center"> | |
| <img src="https://huggingface.co/datasets/ybelkada/documentation-images/resolve/main/mistral-7b-inference-large-seqlen.png"> | |
| </div> | |
| ### μ¬λΌμ΄λ© μλμ° μ΄ν μ [[sliding-window-attention]] | |
| νμ¬ κ΅¬νμ μ¬λΌμ΄λ© μλμ° μ΄ν μ λ©μ»€λμ¦κ³Ό λ©λͺ¨λ¦¬ ν¨μ¨μ μΈ μΊμ κ΄λ¦¬ κΈ°λ₯μ μ§μν©λλ€. μ¬λΌμ΄λ© μλμ° μ΄ν μ μ νμ±ννλ €λ©΄, μ¬λΌμ΄λ© μλμ° μ΄ν μ κ³Ό νΈνλλ`flash-attn`(`>=2.3.0`)λ²μ μ μ¬μ©νλ©΄ λ©λλ€. | |
| λν νλμ μ΄ν μ 2 λͺ¨λΈμ λ λ©λͺ¨λ¦¬ ν¨μ¨μ μΈ μΊμ μ¬λΌμ΄μ± λ©μ»€λμ¦μ μ¬μ©ν©λλ€. λ―Έμ€νΈλ λͺ¨λΈμ 곡μ ꡬνμμ κΆμ₯νλ λ‘€λ§ μΊμ λ©μ»€λμ¦μ λ°λΌ, μΊμ ν¬κΈ°λ₯Ό κ³ μ (`self.config.sliding_window`)μΌλ‘ μ μ§νκ³ , `padding_side="left"`μΈ κ²½μ°μλ§ λ°°μΉ μμ±(batch generation)μ μ§μνλ©°, νμ¬ ν ν°μ μ λ μμΉλ₯Ό μ¬μ©ν΄ μμΉ μλ² λ©μ κ³μ°ν©λλ€. | |
| ## μμνλ‘ λ―Έμ€νΈλ ν¬κΈ° μ€μ΄κΈ°[[shrinking-down-mistral-using-quantization]] | |
| λ―Έμ€νΈλ λͺ¨λΈμ 70μ΅ κ°μ νλΌλ―Έν°λ₯Ό κ°μ§κ³ μμ΄, μ λ°μ μ λ°λ(float16)λ‘ μ½ 14GBμ GPU RAMμ΄ νμν©λλ€. κ° νλΌλ―Έν°κ° 2λ°μ΄νΈλ‘ μ μ₯λκΈ° λλ¬Έμ λλ€. νμ§λ§ [μμν](../quantization.md)λ₯Ό μ¬μ©νλ©΄ λͺ¨λΈ ν¬κΈ°λ₯Ό μ€μΌ μ μμ΅λλ€. λͺ¨λΈμ 4λΉνΈ(μ¦, νλΌλ―Έν°λΉ λ° λ°μ΄νΈ)λ‘ μμννλ©΄ μ½ 3.5GBμ RAMλ§ νμν©λλ€. | |
| λͺ¨λΈμ μμννλ κ²μ `quantization_config`λ₯Ό λͺ¨λΈμ μ λ¬νλ κ²λ§νΌ κ°λ¨ν©λλ€. μλμμλ BitsAndBytes μμνλ₯Ό μ¬μ©νμ§λ§, λ€λ₯Έ μμν λ°©λ²μ [μ΄ νμ΄μ§](../quantization.md)λ₯Ό μ°Έκ³ νμΈμ: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| >>> # specify how to quantize the model | |
| >>> quantization_config = BitsAndBytesConfig( | |
| ... load_in_4bit=True, | |
| ... bnb_4bit_quant_type="nf4", | |
| ... bnb_4bit_compute_dtype="torch.float16", | |
| ... ) | |
| >>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", quantization_config=True, device_map="auto") | |
| >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") | |
| >>> prompt = "My favourite condiment is" | |
| >>> messages = [ | |
| ... {"role": "user", "content": "What is your favourite condiment?"}, | |
| ... {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, | |
| ... {"role": "user", "content": "Do you have mayonnaise recipes?"} | |
| ... ] | |
| >>> model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") | |
| >>> generated_ids = model.generate(model_inputs, max_new_tokens=100, do_sample=True) | |
| >>> tokenizer.batch_decode(generated_ids)[0] | |
| "The expected output" | |
| ``` | |
| μ΄ λͺ¨λΈμ [Younes Belkada](https://huggingface.co/ybelkada)μ [Arthur Zucker](https://huggingface.co/ArthurZ)κ° κΈ°μ¬νμ΅λλ€. | |
| μλ³Έ μ½λλ [μ΄κ³³](https://github.com/mistralai/mistral-src)μμ νμΈν μ μμ΅λλ€. | |
| ## 리μμ€[[resources]] | |
| λ―Έμ€νΈλμ μμνλ λ° λμμ΄ λλ Hugging Faceμ community μλ£ λͺ©λ‘(πλ‘ νμλ¨) μ λλ€. μ¬κΈ°μ ν¬ν¨λ μλ£λ₯Ό μ μΆνκ³ μΆμΌμλ€λ©΄ PR(Pull Request)λ₯Ό μ΄μ΄μ£ΌμΈμ. λ¦¬λ·°ν΄ λλ¦¬κ² μ΅λλ€! μλ£λ κΈ°μ‘΄ μλ£λ₯Ό 볡μ νλ λμ μλ‘μ΄ λ΄μ©μ λ΄κ³ μμ΄μΌ ν©λλ€. | |
| <PipelineTag pipeline="text-generation"/> | |
| - λ―Έμ€νΈλ-7Bμ μ§λν λ―ΈμΈμ‘°μ (SFT)μ μννλ λ°λͺ¨ λ ΈνΈλΆμ [μ΄κ³³](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Mistral/Supervised_fine_tuning_(SFT)_of_an_LLM_using_Hugging_Face_tooling.ipynb)μμ νμΈν μ μμ΅λλ€. π | |
| - 2024λ μ Hugging Face λꡬλ₯Ό μ¬μ©ν΄ LLMμ λ―ΈμΈ μ‘°μ νλ λ°©λ²μ λν [λΈλ‘κ·Έ ν¬μ€νΈ](https://www.philschmid.de/fine-tune-llms-in-2024-with-trl). π | |
| - Hugging Faceμ [μ λ ¬(Alignment) νΈλλΆ](https://github.com/huggingface/alignment-handbook)μλ λ―Έμ€νΈλ-7Bλ₯Ό μ¬μ©ν μ§λν λ―ΈμΈ μ‘°μ (SFT) λ° μ§μ μ νΈ μ΅μ ν(DPO)λ₯Ό μννκΈ° μν μ€ν¬λ¦½νΈμ λ μνΌκ° ν¬ν¨λμ΄ μμ΅λλ€. μ¬κΈ°μλ λ¨μΌ GPUμμ QLoRa λ° λ€μ€ GPUλ₯Ό μ¬μ©ν μ 체 λ―ΈμΈ μ‘°μ μ μν μ€ν¬λ¦½νΈκ° ν¬ν¨λμ΄ μμ΅λλ€. | |
| - [μΈκ³Όμ μΈμ΄ λͺ¨λΈλ§ μμ κ°μ΄λ](../tasks/language_modeling) | |
| ## MistralConfig[[transformers.MistralConfig]] | |
| [[autodoc]] MistralConfig | |
| ## MistralModel[[transformers.MistralModel]] | |
| [[autodoc]] MistralModel | |
| - forward | |
| ## MistralForCausalLM[[transformers.MistralForCausalLM]] | |
| [[autodoc]] MistralForCausalLM | |
| - forward | |
| ## MistralForSequenceClassification[[transformers.MistralForSequenceClassification]] | |
| [[autodoc]] MistralForSequenceClassification | |
| - forward | |
| ## MistralForTokenClassification[[transformers.MistralForTokenClassification]] | |
| [[autodoc]] MistralForTokenClassification | |
| - forward | |
| ## FlaxMistralModel[[transformers.FlaxMistralModel]] | |
| [[autodoc]] FlaxMistralModel | |
| - __call__ | |
| ## FlaxMistralForCausalLM[[transformers.FlaxMistralForCausalLM]] | |
| [[autodoc]] FlaxMistralForCausalLM | |
| - __call__ | |
| ## TFMistralModel[[transformers.TFMistralModel]] | |
| [[autodoc]] TFMistralModel | |
| - call | |
| ## TFMistralForCausalLM[[transformers.TFMistralForCausalLM]] | |
| [[autodoc]] TFMistralForCausalLM | |
| - call | |
| ## TFMistralForSequenceClassification[[transformers.TFMistralForSequenceClassification]] | |
| [[autodoc]] TFMistralForSequenceClassification | |
| - call |