| <!--Copyright 2024 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # λꡬμ RAG[[Tools-and-RAG]] | |
| [`~PreTrainedTokenizerBase.apply_chat_template`] λ©μλλ μ±ν λ©μμ§ μΈμλ λ¬Έμμ΄, 리μ€νΈ, λμ λ리 λ± κ±°μ λͺ¨λ μ’ λ₯μ μΆκ° μΈμ νμ μ μ§μν©λλ€. μ΄λ₯Ό ν΅ν΄ λ€μν μ¬μ© μν©μμ μ±ν ν νλ¦Ώμ νμ©ν μ μμ΅λλ€. | |
| μ΄ κ°μ΄λμμλ λꡬ λ° κ²μ μ¦κ° μμ±(RAG)κ³Ό ν¨κ» μ±ν ν νλ¦Ώμ μ¬μ©νλ λ°©λ²μ 보μ¬λ립λλ€. | |
| ## λꡬ[[Tools]] | |
| λꡬλ λκ·λͺ¨ μΈμ΄ λͺ¨λΈ(LLM)μ΄ νΉμ μμ μ μννκΈ° μν΄ νΈμΆν μ μλ ν¨μμ λλ€. μ΄λ μ€μκ° μ 보, κ³μ° λꡬ λλ λκ·λͺ¨ λ°μ΄ν°λ² μ΄μ€ μ κ·Ό λ±μ ν΅ν΄ λνν μμ΄μ νΈμ κΈ°λ₯μ νμ₯νλ κ°λ ₯ν λ°©λ²μ λλ€. | |
| λꡬλ₯Ό λ§λ€ λλ μλ κ·μΉμ λ°λ₯΄μΈμ. | |
| 1. ν¨μλ κΈ°λ₯μ μ μ€λͺ νλ μ΄λ¦μ κ°μ ΈμΌ ν©λλ€. | |
| 2. ν¨μμ μΈμλ ν¨μ ν€λμ νμ ννΈλ₯Ό ν¬ν¨ν΄μΌ ν©λλ€(`Args` λΈλ‘μλ ν¬ν¨νμ§ λ§μΈμ). | |
| 3. ν¨μμλ [Google μ€νμΌ](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) μ λ μ€νΈλ§(docstring)μ΄ ν¬ν¨λμ΄μΌ ν©λλ€. | |
| 4. ν¨μμ λ°ν νμ κ³Ό `Returns` λΈλ‘μ ν¬ν¨ν μ μμ§λ§, λꡬλ₯Ό νμ©νλ λλΆλΆμ λͺ¨λΈμμ μ΄λ₯Ό μ¬μ©νμ§ μκΈ° λλ¬Έμ 무μν μ μμ΅λλ€. | |
| μ£Όμ΄μ§ μμΉμ νμ¬ μ¨λμ νμμ κ°μ Έμ€λ λꡬμ μμλ μλμ κ°μ΅λλ€. | |
| ```py | |
| def get_current_temperature(location: str, unit: str) -> float: | |
| """ | |
| μ£Όμ΄μ§ μμΉμ νμ¬ μ¨λλ₯Ό κ°μ Έμ΅λλ€. | |
| Args: | |
| location: μ¨λλ₯Ό κ°μ Έμ¬ μμΉ, "λμ, κ΅κ°" νμ | |
| unit: μ¨λλ₯Ό λ°νν λ¨μ. (μ νμ§: ["celsius(μμ¨)", "fahrenheit(νμ¨)"]) | |
| Returns: | |
| μ£Όμ΄μ§ μμΉμ μ§μ λ λ¨μλ‘ νμλ νμ¬ μ¨λ(float μλ£ν). | |
| """ | |
| return 22. # μ€μ ν¨μλΌλ©΄ μλ§ μ§μ§λ‘ κΈ°μ¨μ κ°μ ΈμμΌκ² μ£ ! | |
| def get_current_wind_speed(location: str) -> float: | |
| """ | |
| μ£Όμ΄μ§ μμΉμ νμ¬ νμμ km/h λ¨μλ‘ κ°μ Έμ΅λλ€. | |
| Args: | |
| location: μ¨λλ₯Ό κ°μ Έμ¬ μμΉ, "λμ, κ΅κ°" νμ | |
| Returns: | |
| μ£Όμ΄μ§ μμΉμ νμ¬ νμ(km/h, float μλ£ν). | |
| """ | |
| return 6. # μ€μ ν¨μλΌλ©΄ μλ§ μ§μ§λ‘ νμμ κ°μ ΈμμΌκ² μ£ ! | |
| tools = [get_current_temperature, get_current_wind_speed] | |
| ``` | |
| [NousResearch/Hermes-2-Pro-Llama-3-8B](https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B)μ κ°μ΄ λꡬ μ¬μ©μ μ§μνλ λͺ¨λΈκ³Ό ν ν¬λμ΄μ λ₯Ό κ°μ Έμ€μΈμ. νλμ¨μ΄κ° μ§μλλ€λ©΄ [Command-R](./model_doc/cohere)μ΄λ [Mixtral-8x22B](./model_doc/mixtral)μ κ°μ λ ν° λͺ¨λΈλ κ³ λ €ν μ μμ΅λλ€. | |
| ```py | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B") | |
| tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B") | |
| model = AutoModelForCausalLM.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B", torch_dtype=torch.bfloat16, device_map="auto") | |
| ``` | |
| μ±ν λ©μμ§λ₯Ό μμ±ν©λλ€. | |
| ```py | |
| messages = [ | |
| {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."}, | |
| {"role": "user", "content": "Hey, what's the temperature in Paris right now?"} | |
| ] | |
| ``` | |
| `messages`μ λꡬ λͺ©λ‘ `tools`λ₯Ό [`~PreTrainedTokenizerBase.apply_chat_template`]μ μ λ¬ν λ€, μ΄λ₯Ό λͺ¨λΈμ μ λ ₯μΌλ‘ μ¬μ©νμ¬ ν μ€νΈλ₯Ό μμ±ν μ μμ΅λλ€. | |
| ```py | |
| inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt") | |
| inputs = {k: v for k, v in inputs.items()} | |
| outputs = model.generate(**inputs, max_new_tokens=128) | |
| print(tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):])) | |
| ``` | |
| ```txt | |
| <tool_call> | |
| {"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"} | |
| </tool_call><|im_end|> | |
| ``` | |
| μ±ν λͺ¨λΈμ λ μ€νΈλ§(docstring)μ μ μλ νμμ λ°λΌ `get_current_temperature` ν¨μμ μ¬λ°λ₯Έ λ§€κ°λ³μλ₯Ό μ λ¬ν΄ νΈμΆνμ΅λλ€. ν리λ₯Ό κΈ°μ€μΌλ‘ μμΉλ₯Ό νλμ€λ‘ μΆλ‘ νμΌλ©°, μ¨λ λ¨μλ μμ¨λ₯Ό μ¬μ©ν΄μΌ νλ€κ³ νλ¨νμ΅λλ€. | |
| μ΄μ `get_current_temperature` ν¨μμ ν΄λΉ μΈμλ€μ `tool_call` λμ λ리μ λ΄μ μ±ν λ©μμ§μ μΆκ°ν©λλ€. `tool_call` λμ λ리λ `system`μ΄λ `user`κ° μλ `assistant` μν λ‘ μ 곡λμ΄μΌ ν©λλ€. | |
| > [!WARNING] | |
| > OpenAI APIλ `tool_call` νμμΌλ‘ JSON λ¬Έμμ΄μ μ¬μ©ν©λλ€. Transformersμμ μ¬μ©ν κ²½μ° λμ λ리λ₯Ό μꡬνκΈ° λλ¬Έμ, μ€λ₯κ° λ°μνκ±°λ λͺ¨λΈμ΄ μ΄μνκ² λμν μ μμ΅λλ€. | |
| <hfoptions id="tool-call"> | |
| <hfoption id="Llama"> | |
| ```py | |
| tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}} | |
| messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]}) | |
| ``` | |
| μ΄μμ€ν΄νΈκ° ν¨μ μΆλ ₯μ μ½κ³ μ¬μ©μμ μ±ν ν μ μλλ‘ ν©λλ€. | |
| ```py | |
| inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt") | |
| inputs = {k: v for k, v in inputs.items()} | |
| out = model.generate(**inputs, max_new_tokens=128) | |
| print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):])) | |
| ``` | |
| ```txt | |
| The temperature in Paris, France right now is approximately 12Β°C (53.6Β°F).<|im_end|> | |
| ``` | |
| </hfoption> | |
| <hfoption id="Mistral/Mixtral"> | |
| [Mistral](./model_doc/mistral) λ° [Mixtral](./model_doc/mixtral) λͺ¨λΈμ κ²½μ° μΆκ°μ μΌλ‘ `tool_call_id`κ° νμν©λλ€. `tool_call_id`λ 9μ리 μμ«μ λ¬Έμμ΄λ‘ μμ±λμ΄ `tool_call` λμ λ리μ `id` ν€μ ν λΉλ©λλ€. | |
| ```py | |
| tool_call_id = "9Ae3bDc2F" | |
| tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}} | |
| messages.append({"role": "assistant", "tool_calls": [{"type": "function", "id": tool_call_id, "function": tool_call}]}) | |
| ``` | |
| ```py | |
| inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt") | |
| inputs = {k: v for k, v in inputs.items()} | |
| out = model.generate(**inputs, max_new_tokens=128) | |
| print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):])) | |
| ``` | |
| </hfoption> | |
| </hfoptions> | |
| ## μ€ν€λ§[[Schema]] | |
| [`~PreTrainedTokenizerBase.apply_chat_template`]μ ν¨μλ₯Ό [JSON μ€ν€λ§](https://json-schema.org/learn/getting-started-step-by-step)λ‘ λ³ννμ¬ μ±ν ν νλ¦Ώμ μ λ¬ν©λλ€. LLMμ ν¨μ λ΄λΆμ μ½λλ₯Ό λ³΄μ§ λͺ»ν©λλ€. λ€μ λ§ν΄, LLMμ ν¨μκ° κΈ°μ μ μΌλ‘ μ΄λ»κ² μλνλμ§λ μ κ²½ μ°μ§ μκ³ , ν¨μμ **μ μ**μ **μΈμ**λ§ μ°Έμ‘°ν©λλ€. | |
| ν¨μκ° μμ λμ΄λ κ·μΉμ λ°λ₯΄λ©΄, λ΄λΆμμ JSON μ€ν€λ§κ° μλμΌλ‘ μμ±λ©λλ€. νμ§λ§ λ λμ κ°λ μ±μ΄λ λλ²κΉ μ μν΄ [get_json_schema](https://github.com/huggingface/transformers/blob/14561209291255e51c55260306c7d00c159381a5/src/transformers/utils/chat_template_utils.py#L205)λ₯Ό μ¬μ©νμ¬ μ€ν€λ§λ₯Ό μλμΌλ‘ λ³νν μ μμ΅λλ€. | |
| ```py | |
| from transformers.utils import get_json_schema | |
| def multiply(a: float, b: float): | |
| """ | |
| λ μ«μλ₯Ό κ³±νλ ν¨μ | |
| Args: | |
| a: κ³±ν 첫 λ²μ§Έ μ«μ | |
| b: κ³±ν λ λ²μ§Έ μ«μ | |
| """ | |
| return a * b | |
| schema = get_json_schema(multiply) | |
| print(schema) | |
| ``` | |
| ```json | |
| { | |
| "type": "function", | |
| "function": { | |
| "name": "multiply", | |
| "description": "A function that multiplies two numbers", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "a": { | |
| "type": "number", | |
| "description": "The first number to multiply" | |
| }, | |
| "b": { | |
| "type": "number", | |
| "description": "The second number to multiply" | |
| } | |
| }, | |
| "required": ["a", "b"] | |
| } | |
| } | |
| } | |
| ``` | |
| μ€ν€λ§λ₯Ό νΈμ§νκ±°λ μ²μλΆν° μ§μ μμ±ν μ μμ΅λλ€. μ΄λ₯Ό ν΅ν΄ λ 볡μ‘ν ν¨μμ λν μ νν μ€ν€λ§λ₯Ό μ μ°νκ² μ μν μ μμ΅λλ€. | |
| > [!WARNING] | |
| > ν¨μ μκ·Έλμ²λ₯Ό λ¨μνκ² μ μ§νκ³ μΈμλ₯Ό μ΅μνμΌλ‘ μ μ§νμΈμ. μ΄λ¬ν ν¨μλ μ€μ²©λ μΈμλ₯Ό κ°μ§ 볡μ‘ν ν¨μμ λΉν΄ λͺ¨λΈμ΄ λ μ½κ² μ΄ν΄νκ³ μ¬μ©ν μ μμ΅λλ€. | |
| μλ μμλ μ€ν€λ§λ₯Ό μλμΌλ‘ μμ±ν λ€μ [`~PreTrainedTokenizerBase.apply_chat_template`]μ μ λ¬νλ λ°©λ²μ 보μ¬μ€λλ€. | |
| ```py | |
| # μΈμλ₯Ό λ°μ§ μλ κ°λ¨ν ν¨μ | |
| current_time = { | |
| "type": "function", | |
| "function": { | |
| "name": "current_time", | |
| "description": "Get the current local time as a string.", | |
| "parameters": { | |
| 'type': 'object', | |
| 'properties': {} | |
| } | |
| } | |
| } | |
| # λ κ°μ μ«μ μΈμλ₯Ό λ°λ λ μμ ν ν¨μ | |
| multiply = { | |
| 'type': 'function', | |
| 'function': { | |
| 'name': 'multiply', | |
| 'description': 'A function that multiplies two numbers', | |
| 'parameters': { | |
| 'type': 'object', | |
| 'properties': { | |
| 'a': { | |
| 'type': 'number', | |
| 'description': 'The first number to multiply' | |
| }, | |
| 'b': { | |
| 'type': 'number', 'description': 'The second number to multiply' | |
| } | |
| }, | |
| 'required': ['a', 'b'] | |
| } | |
| } | |
| } | |
| model_input = tokenizer.apply_chat_template( | |
| messages, | |
| tools = [current_time, multiply] | |
| ) | |
| ``` | |
| ## RAG[[RAG]] | |
| κ²μ μ¦κ° μμ±(Retrieval-augmented generation, RAG) λͺ¨λΈμ 쿼리λ₯Ό λ°ννκΈ° μ μ λ¬Έμλ₯Ό κ²μν΄ μΆκ° μ 보λ₯Ό μ»μ΄ λͺ¨λΈμ΄ κΈ°μ‘΄μ κ°μ§κ³ μλ μ§μμ νμ₯μν΅λλ€. RAG λͺ¨λΈμ κ²½μ°, [`~PreTrainedTokenizerBase.apply_chat_template`]μ `documents` λ§€κ°λ³μλ₯Ό μΆκ°νμΈμ. μ΄ `documents` λ§€κ°λ³μλ λ¬Έμ λͺ©λ‘μ΄μ΄μΌ νλ©°, κ° λ¬Έμλ `title`κ³Ό `content` ν€λ₯Ό κ°μ§ λ¨μΌ λμ λ리μ¬μΌ ν©λλ€. | |
| > [!TIP] | |
| > RAGλ₯Ό μν `documents` λ§€κ°λ³μλ νλκ² μ§μλμ§ μμΌλ©° λ§μ λͺ¨λΈλ€μ΄ `documents`λ₯Ό 무μνλ μ±ν ν νλ¦Ώμ κ°μ§κ³ μμ΅λλ€. λͺ¨λΈμ΄ `documents`λ₯Ό μ§μνλμ§ νμΈνλ €λ©΄ λͺ¨λΈ μΉ΄λλ₯Ό μ½κ±°λ `print(tokenizer.chat_template)`λ₯Ό μ€ννμ¬ `documents` ν€κ° μλμ§ νμΈνμΈμ. [Command-R](https://hf.co/CohereForAI/c4ai-command-r-08-2024)κ³Ό [Command-R+](https://hf.co/CohereForAI/c4ai-command-r-plus-08-2024)λ λͺ¨λ RAG μ±ν ν νλ¦Ώμμ `documents`λ₯Ό μ§μν©λλ€. | |
| λͺ¨λΈμ μ λ¬ν λ¬Έμ λͺ©λ‘μ μμ±νμΈμ. | |
| ```py | |
| documents = [ | |
| { | |
| "title": "The Moon: Our Age-Old Foe", | |
| "text": "Man has always dreamed of destroying the moon. In this essay, I shall..." | |
| }, | |
| { | |
| "title": "The Sun: Our Age-Old Friend", | |
| "text": "Although often underappreciated, the sun provides several notable benefits..." | |
| } | |
| ] | |
| ``` | |
| [`~PreTrainedTokenizerBase.apply_chat_template`]μμ `chat_template="rag"`λ₯Ό μ€μ νκ³ μλ΅μ μμ±νμΈμ. | |
| ```py | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| # λͺ¨λΈκ³Ό ν ν¬λμ΄μ λ‘λ | |
| tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit") | |
| model = AutoModelForCausalLM.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit", device_map="auto") | |
| device = model.device # λͺ¨λΈμ κ°μ Έμ¨ μ₯μΉ νμΈ | |
| # λν μ λ ₯ μ μ | |
| conversation = [ | |
| {"role": "user", "content": "What has Man always dreamed of?"} | |
| ] | |
| input_ids = tokenizer.apply_chat_template( | |
| conversation=conversation, | |
| documents=documents, | |
| chat_template="rag", | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_tensors="pt").to(device) | |
| # μλ΅ μμ± | |
| generated_tokens = model.generate( | |
| input_ids, | |
| max_new_tokens=100, | |
| do_sample=True, | |
| temperature=0.3, | |
| ) | |
| # μμ±λ ν μ€νΈλ₯Ό λμ½λ©νκ³ μμ± ν둬ννΈμ ν¨κ» μΆλ ₯ | |
| generated_text = tokenizer.decode(generated_tokens[0]) | |
| print(generated_text) | |
| ``` | |