IRIS-FLOWER-CLASSIFICATION-using-machine-learning-models
/
transformers
/docs
/source
/ko
/llm_tutorial.md
| <!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # λκ·λͺ¨ μΈμ΄ λͺ¨λΈλ‘ μμ±νκΈ° [[generation-with-llms]] | |
| [[open-in-colab]] | |
| LLM λλ λκ·λͺ¨ μΈμ΄ λͺ¨λΈμ ν μ€νΈ μμ±μ ν΅μ¬ κ΅¬μ± μμμ λλ€. κ°λ¨ν λ§νλ©΄, μ£Όμ΄μ§ μ λ ₯ ν μ€νΈμ λν λ€μ λ¨μ΄(μ ννκ²λ ν ν°)λ₯Ό μμΈ‘νκΈ° μν΄ νλ ¨λ λκ·λͺ¨ μ¬μ νλ ¨ λ³νκΈ° λͺ¨λΈλ‘ ꡬμ±λ©λλ€. ν ν°μ ν λ²μ νλμ© μμΈ‘νκΈ° λλ¬Έμ μλ‘μ΄ λ¬Έμ₯μ μμ±νλ €λ©΄ λͺ¨λΈμ νΈμΆνλ κ² μΈμ λ 볡μ‘ν μμ μ μνν΄μΌ ν©λλ€. μ¦, μκΈ°νκ· μμ±μ μνν΄μΌ ν©λλ€. | |
| μκΈ°νκ· μμ±μ λͺ κ°μ μ΄κΈ° μ λ ₯κ°μ μ 곡ν ν, κ·Έ μΆλ ₯μ λ€μ λͺ¨λΈμ μ λ ₯μΌλ‘ μ¬μ©νμ¬ λ°λ³΅μ μΌλ‘ νΈμΆνλ μΆλ‘ κ³Όμ μ λλ€. π€ Transformersμμλ [`~generation.GenerationMixin.generate`] λ©μλκ° μ΄ μν μ νλ©°, μ΄λ μμ± κΈ°λ₯μ κ°μ§ λͺ¨λ λͺ¨λΈμμ μ¬μ© κ°λ₯ν©λλ€. | |
| μ΄ νν 리μΌμμλ λ€μ λ΄μ©μ λ€λ£¨κ² λ©λλ€: | |
| * LLMμΌλ‘ ν μ€νΈ μμ± | |
| * μΌλ°μ μΌλ‘ λ°μνλ λ¬Έμ ν΄κ²° | |
| * LLMμ μ΅λν νμ©νκΈ° μν λ€μ λ¨κ³ | |
| μμνκΈ° μ μ νμν λͺ¨λ λΌμ΄λΈλ¬λ¦¬κ° μ€μΉλμ΄ μλμ§ νμΈνμΈμ: | |
| ```bash | |
| pip install transformers bitsandbytes>=0.39.0 -q | |
| ``` | |
| ## ν μ€νΈ μμ± [[generate-text]] | |
| [μΈκ³Όμ μΈμ΄ λͺ¨λΈλ§(causal language modeling)](tasks/language_modeling)μ λͺ©μ μΌλ‘ νμ΅λ μΈμ΄ λͺ¨λΈμ μΌλ ¨μ ν μ€νΈ ν ν°μ μ λ ₯μΌλ‘ μ¬μ©νκ³ , κ·Έ κ²°κ³Όλ‘ λ€μ ν ν°μ΄ λμ¬ νλ₯ λΆν¬λ₯Ό μ 곡ν©λλ€. | |
| <!-- [GIF 1 -- FWD PASS] --> | |
| <figure class="image table text-center m-0 w-full"> | |
| <video | |
| style="max-width: 90%; margin: auto;" | |
| autoplay loop muted playsinline | |
| src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/assisted-generation/gif_1_1080p.mov" | |
| ></video> | |
| <figcaption>"LLMμ μ λ°© ν¨μ€"</figcaption> | |
| </figure> | |
| LLMκ³Ό μκΈ°νκ· μμ±μ ν¨κ» μ¬μ©ν λ ν΅μ¬μ μΈ λΆλΆμ μ΄ νλ₯ λΆν¬λ‘λΆν° λ€μ ν ν°μ μ΄λ»κ² κ³ λ₯Ό κ²μΈμ§μ λλ€. λ€μ λ°λ³΅ κ³Όμ μ μ¬μ©λ ν ν°μ κ²°μ νλ ν, μ΄λ ν λ°©λ²λ κ°λ₯ν©λλ€. νλ₯ λΆν¬μμ κ°μ₯ κ°λ₯μ±μ΄ λμ ν ν°μ μ ννλ κ²μ²λΌ κ°λ¨ν μλ μκ³ , κ²°κ³Ό λΆν¬μμ μνλ§νκΈ° μ μ μμ κ°μ§ λ³νμ μ μ©νλ κ²μ²λΌ 볡μ‘ν μλ μμ΅λλ€. | |
| <!-- [GIF 2 -- TEXT GENERATION] --> | |
| <figure class="image table text-center m-0 w-full"> | |
| <video | |
| style="max-width: 90%; margin: auto;" | |
| autoplay loop muted playsinline | |
| src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/assisted-generation/gif_2_1080p.mov" | |
| ></video> | |
| <figcaption>"μκΈ°νκ· μμ±μ νλ₯ λΆν¬μμ λ€μ ν ν°μ λ°λ³΅μ μΌλ‘ μ ννμ¬ ν μ€νΈλ₯Ό μμ±ν©λλ€."</figcaption> | |
| </figure> | |
| μμμ μ€λͺ ν κ³Όμ μ μ΄λ€ μ’ λ£ μ‘°κ±΄μ΄ μΆ©μ‘±λ λκΉμ§ λ°λ³΅μ μΌλ‘ μνλ©λλ€. λͺ¨λΈμ΄ μνμ€μ λ(EOS ν ν°)μ μΆλ ₯ν λκΉμ§λ₯Ό μ’ λ£ μ‘°κ±΄μΌλ‘ νλ κ²μ΄ μ΄μμ μ λλ€. κ·Έλ μ§ μμ κ²½μ°μλ 미리 μ μλ μ΅λ κΈΈμ΄μ λλ¬νμ λ μμ±μ΄ μ€λ¨λ©λλ€. | |
| λͺ¨λΈμ΄ μμλλ‘ λμνκΈ° μν΄μ ν ν° μ ν λ¨κ³μ μ μ§ μ‘°κ±΄μ μ¬λ°λ₯΄κ² μ€μ νλ κ²μ΄ μ€μν©λλ€. μ΄λ¬ν μ΄μ λ‘, κ° λͺ¨λΈμλ κΈ°λ³Έ μμ± μ€μ μ΄ μ μ μλ [`~generation.GenerationConfig`] νμΌμ΄ ν¨κ» μ 곡λ©λλ€. | |
| μ½λλ₯Ό νμΈν΄λ΄ μλ€! | |
| <Tip> | |
| κΈ°λ³Έ LLM μ¬μ©μ κ΄μ¬μ΄ μλ€λ©΄, μ°λ¦¬μ [`Pipeline`](pipeline_tutorial) μΈν°νμ΄μ€λ‘ μμνλ κ²μ μΆμ²ν©λλ€. κ·Έλ¬λ LLMμ μμνλ ν ν° μ ν λ¨κ³μμμ λ―ΈμΈν μ μ΄μ κ°μ κ³ κΈ κΈ°λ₯λ€μ μ’ μ’ νμλ‘ ν©λλ€. μ΄λ¬ν μμ μ [`~generation.GenerationMixin.generate`]λ₯Ό ν΅ν΄ κ°μ₯ μ μνλ μ μμ΅λλ€. LLMμ μ΄μ©ν μκΈ°νκ· μμ±μ μμμ λ§μ΄ μλͺ¨νλ―λ‘, μ μ ν μ²λ¦¬λμ μν΄ GPUμμ μ€νλμ΄μΌ ν©λλ€. | |
| </Tip> | |
| λ¨Όμ , λͺ¨λΈμ λΆλ¬μ€μΈμ. | |
| ```python | |
| >>> from transformers import AutoModelForCausalLM | |
| >>> model = AutoModelForCausalLM.from_pretrained( | |
| ... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True | |
| ... ) | |
| ``` | |
| `from_pretrained` ν¨μλ₯Ό νΈμΆν λ 2κ°μ νλκ·Έλ₯Ό μ£Όλͺ©νμΈμ: | |
| - `device_map`μ λͺ¨λΈμ΄ GPUλ‘ μ΄λλλλ‘ ν©λλ€. | |
| - `load_in_4bit`λ 리μμ€ μꡬ μ¬νμ ν¬κ² μ€μ΄κΈ° μν΄ [4λΉνΈ λμ μμν](main_classes/quantization)λ₯Ό μ μ©ν©λλ€. | |
| μ΄ μΈμλ λͺ¨λΈμ μ΄κΈ°ννλ λ€μν λ°©λ²μ΄ μμ§λ§, LLMμ μ²μ μμν λ μ΄ μ€μ μ μΆμ²ν©λλ€. | |
| μ΄μ΄μ ν μ€νΈ μ λ ₯μ [ν ν¬λμ΄μ ](tokenizer_summary)μΌλ‘ μ μ²λ¦¬νμΈμ. | |
| ```python | |
| >>> from transformers import AutoTokenizer | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1") | |
| >>> device = "cuda" if torch.cuda.is_available() else "cpu" | |
| >>> model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to(device) | |
| ``` | |
| `model_inputs` λ³μμλ ν ν°νλ ν μ€νΈ μ λ ₯κ³Ό ν¨κ» μ΄ν μ λ§μ€ν¬κ° λ€μ΄ μμ΅λλ€. [`~generation.GenerationMixin.generate`]λ μ΄ν μ λ§μ€ν¬κ° μ 곡λμ§ μμμ κ²½μ°μλ μ΄λ₯Ό μΆλ‘ νλ €κ³ λ Έλ ₯νμ§λ§, μ΅μμ μ±λ₯μ μν΄μλ κ°λ₯νλ©΄ μ΄ν μ λ§μ€ν¬λ₯Ό μ λ¬νλ κ²μ κΆμ₯ν©λλ€. | |
| λ§μ§λ§μΌλ‘ [`~generation.GenerationMixin.generate`] λ©μλλ₯Ό νΈμΆν΄ μμ±λ ν ν°μ μ»μ ν, μ΄λ₯Ό μΆλ ₯νκΈ° μ μ ν μ€νΈ ννλ‘ λ³ννμΈμ. | |
| ```python | |
| >>> generated_ids = model.generate(**model_inputs) | |
| >>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| 'A list of colors: red, blue, green, yellow, black, white, and brown' | |
| ``` | |
| μ΄κ² μ λΆμ λλ€! λͺ μ€μ μ½λλ§μΌλ‘ LLMμ λ₯λ ₯μ νμ©ν μ μκ² λμμ΅λλ€. | |
| ## μΌλ°μ μΌλ‘ λ°μνλ λ¬Έμ [[common-pitfalls]] | |
| [μμ± μ λ΅](generation_strategies)μ΄ λ§κ³ , κΈ°λ³Έκ°μ΄ νμ μ¬μ© μ¬λ‘μ μ ν©νμ§ μμ μ μμ΅λλ€. μΆλ ₯μ΄ μμκ³Ό λ€λ₯Ό λ νν λ°μνλ λ¬Έμ μ μ΄λ₯Ό ν΄κ²°νλ λ°©λ²μ λν λͺ©λ‘μ λ§λ€μμ΅λλ€. | |
| ```py | |
| >>> from transformers import AutoModelForCausalLM, AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1") | |
| >>> tokenizer.pad_token = tokenizer.eos_token # Mistral has no pad token by default | |
| >>> model = AutoModelForCausalLM.from_pretrained( | |
| ... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True | |
| ... ) | |
| ``` | |
| ### μμ±λ μΆλ ₯μ΄ λ무 μ§§κ±°λ κΈΈλ€ [[generated-output-is-too-shortlong]] | |
| [`~generation.GenerationConfig`] νμΌμμ λ³λλ‘ μ§μ νμ§ μμΌλ©΄, `generate`λ κΈ°λ³Έμ μΌλ‘ μ΅λ 20κ°μ ν ν°μ λ°νν©λλ€. `generate` νΈμΆμμ `max_new_tokens`μ μλμΌλ‘ μ€μ νμ¬ λ°νν μ μλ μ ν ν°μ μ΅λ μλ₯Ό μ€μ νλ κ²μ΄ μ’μ΅λλ€. LLM(μ ννκ²λ [λμ½λ μ μ© λͺ¨λΈ](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt))μ μ λ ₯ ν둬ννΈλ μΆλ ₯μ μΌλΆλ‘ λ°νν©λλ€. | |
| ```py | |
| >>> model_inputs = tokenizer(["A sequence of numbers: 1, 2"], return_tensors="pt").to("cuda") | |
| >>> # By default, the output will contain up to 20 tokens | |
| >>> generated_ids = model.generate(**model_inputs, pad_token_id=tokenizer.eos_token_id) | |
| >>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| 'A sequence of numbers: 1, 2, 3, 4, 5' | |
| >>> # Setting `max_new_tokens` allows you to control the maximum length | |
| >>> generated_ids = model.generate(**model_inputs, pad_token_id=tokenizer.eos_token_id, max_new_tokens=50) | |
| >>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| 'A sequence of numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,' | |
| ``` | |
| ### μλͺ»λ μμ± λͺ¨λ [[incorrect-generation-mode]] | |
| κΈ°λ³Έμ μΌλ‘ [`~generation.GenerationConfig`] νμΌμμ λ³λλ‘ μ§μ νμ§ μμΌλ©΄, `generate`λ κ° λ°λ³΅μμ κ°μ₯ νλ₯ μ΄ λμ ν ν°μ μ νν©λλ€(그리λ λμ½λ©). νλ €λ μμ μ λ°λΌ μ΄ λ°©λ²μ λ°λμ§νμ§ μμ μ μμ΅λλ€. μλ₯Ό λ€μ΄, μ±λ΄μ΄λ μμΈμ΄ μμ±κ³Ό κ°μ μ°½μμ μΈ μμ μ μνλ§μ΄ μ ν©ν μ μμ΅λλ€. λ°λ©΄, μ€λμ€λ₯Ό ν μ€νΈλ‘ λ³ννκ±°λ λ²μκ³Ό κ°μ μ λ ₯ κΈ°λ° μμ μ 그리λ λμ½λ©μ΄ λ μ ν©ν μ μμ΅λλ€. `do_sample=True`λ‘ μνλ§μ νμ±νν μ μμΌλ©°, μ΄ μ£Όμ μ λν μμΈν λ΄μ©μ μ΄ [λΈλ‘κ·Έ ν¬μ€νΈ](https://huggingface.co/blog/how-to-generate)μμ λ³Ό μ μμ΅λλ€. | |
| ```python | |
| >>> # Set seed or reproducibility -- you don't need this unless you want full reproducibility | |
| >>> from transformers import set_seed | |
| >>> set_seed(0) | |
| >>> model_inputs = tokenizer(["I am a cat."], return_tensors="pt").to("cuda") | |
| >>> # LLM + greedy decoding = repetitive, boring output | |
| >>> generated_ids = model.generate(**model_inputs) | |
| >>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| 'I am a cat. I am a cat. I am a cat. I am a cat' | |
| >>> # With sampling, the output becomes more creative! | |
| >>> generated_ids = model.generate(**model_inputs, do_sample=True) | |
| >>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| 'I am a cat.\nI just need to be. I am always.\nEvery time' | |
| ``` | |
| ### μλͺ»λ ν¨λ© [[wrong-padding-side]] | |
| LLMμ [λμ½λ μ μ©](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt) ꡬ쑰λ₯Ό κ°μ§κ³ μμ΄, μ λ ₯ ν둬ννΈμ λν΄ μ§μμ μΌλ‘ λ°λ³΅ μ²λ¦¬λ₯Ό ν©λλ€. μ λ ₯ λ°μ΄ν°μ κΈΈμ΄κ° λ€λ₯΄λ©΄ ν¨λ© μμ μ΄ νμν©λλ€. LLMμ ν¨λ© ν ν°μμ μλμ μ΄μ΄κ°λλ‘ μ€κ³λμ§ μμκΈ° λλ¬Έμ, μ λ ₯ μΌμͺ½μ ν¨λ©μ΄ μΆκ° λμ΄μΌ ν©λλ€. κ·Έλ¦¬κ³ μ΄ν μ λ§μ€ν¬λ κΌ `generate` ν¨μμ μ λ¬λμ΄μΌ ν©λλ€! | |
| ```python | |
| >>> # The tokenizer initialized above has right-padding active by default: the 1st sequence, | |
| >>> # which is shorter, has padding on the right side. Generation fails. | |
| >>> model_inputs = tokenizer( | |
| ... ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt" | |
| ... ).to("cuda") | |
| >>> generated_ids = model.generate(**model_inputs) | |
| >>> tokenizer.batch_decode(generated_ids[0], skip_special_tokens=True)[0] | |
| '' | |
| >>> # With left-padding, it works as expected! | |
| >>> tokenizer = AutoTokenizer.from_pretrained("openlm-research/open_llama_7b", padding_side="left") | |
| >>> tokenizer.pad_token = tokenizer.eos_token # Llama has no pad token by default | |
| >>> model_inputs = tokenizer( | |
| ... ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt" | |
| ... ).to("cuda") | |
| >>> generated_ids = model.generate(**model_inputs) | |
| >>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| '1, 2, 3, 4, 5, 6,' | |
| ``` | |
| <!-- TODO: when the prompting guide is ready, mention the importance of setting the right prompt in this section --> | |
| ## μΆκ° μλ£ [[further-resources]] | |
| μκΈ°νκ· μμ± νλ‘μΈμ€λ μλμ μΌλ‘ λ¨μν νΈμ΄μ§λ§, LLMμ μ΅λν νμ©νλ €λ©΄ μ¬λ¬ κ°μ§ μμλ₯Ό κ³ λ €ν΄μΌ νλ―λ‘ μ½μ§ μμ μ μμ΅λλ€. LLMμ λν λ κΉμ μ΄ν΄μ νμ©μ μν λ€μ λ¨κ³λ μλμ κ°μ΅λλ€: | |
| <!-- TODO: complete with new guides --> | |
| ### κ³ κΈ μμ± μ¬μ© [[advanced-generate-usage]] | |
| 1. [κ°μ΄λ](generation_strategies)λ λ€μν μμ± λ°©λ²μ μ μ΄νλ λ°©λ², μμ± μ€μ νμΌμ μ€μ νλ λ°©λ², μΆλ ₯μ μ€νΈλ¦¬λ°νλ λ°©λ²μ λν΄ μ€λͺ ν©λλ€. | |
| 2. [`~generation.GenerationConfig`]μ [`~generation.GenerationMixin.generate`], [generate-related classes](internal/generation_utils)λ₯Ό μ°Έμ‘°ν΄λ³΄μΈμ. | |
| ### LLM 리λ보λ [[llm-leaderboards]] | |
| 1. [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)λ μ€ν μμ€ λͺ¨λΈμ νμ§μ μ€μ μ λ‘λλ€. | |
| 2. [Open LLM-Perf Leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard)λ LLM μ²λ¦¬λμ μ€μ μ λ‘λλ€. | |
| ### μ§μ° μκ° λ° μ²λ¦¬λ [[latency-and-throughput]] | |
| 1. λ©λͺ¨λ¦¬ μꡬ μ¬νμ μ€μ΄λ €λ©΄, λμ μμνμ λν [κ°μ΄λ](main_classes/quantization)λ₯Ό μ°Έμ‘°νμΈμ. | |
| ### κ΄λ ¨ λΌμ΄λΈλ¬λ¦¬ [[related-libraries]] | |
| 1. [`text-generation-inference`](https://github.com/huggingface/text-generation-inference)λ LLMμ μν μ€μ μ΄μ νκ²½μ μ ν©ν μλ²μ λλ€. | |
| 2. [`optimum`](https://github.com/huggingface/optimum)μ νΉμ νλμ¨μ΄ μ₯μΉμμ LLMμ μ΅μ ννκΈ° μν΄ π€ Transformersλ₯Ό νμ₯ν κ²μ λλ€. | |