transformers / docs /source /ko /chat_extras.md
AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified

도ꡬ와 RAG[[Tools-and-RAG]]

[~PreTrainedTokenizerBase.apply_chat_template] λ©”μ†Œλ“œλŠ” μ±„νŒ… λ©”μ‹œμ§€ 외에도 λ¬Έμžμ—΄, 리슀트, λ”•μ…”λ„ˆλ¦¬ λ“± 거의 λͺ¨λ“  μ’…λ₯˜μ˜ μΆ”κ°€ 인수 νƒ€μž…μ„ μ§€μ›ν•©λ‹ˆλ‹€. 이λ₯Ό 톡해 λ‹€μ–‘ν•œ μ‚¬μš© μƒν™©μ—μ„œ μ±„νŒ… ν…œν”Œλ¦Ώμ„ ν™œμš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

이 κ°€μ΄λ“œμ—μ„œλŠ” 도ꡬ 및 검색 증강 생성(RAG)κ³Ό ν•¨κ»˜ μ±„νŒ… ν…œν”Œλ¦Ώμ„ μ‚¬μš©ν•˜λŠ” 방법을 λ³΄μ—¬λ“œλ¦½λ‹ˆλ‹€.

도ꡬ[[Tools]]

λ„κ΅¬λŠ” λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ(LLM)이 νŠΉμ • μž‘μ—…μ„ μˆ˜ν–‰ν•˜κΈ° μœ„ν•΄ ν˜ΈμΆœν•  수 μžˆλŠ” ν•¨μˆ˜μž…λ‹ˆλ‹€. μ΄λŠ” μ‹€μ‹œκ°„ 정보, 계산 도ꡬ λ˜λŠ” λŒ€κ·œλͺ¨ λ°μ΄ν„°λ² μ΄μŠ€ μ ‘κ·Ό 등을 톡해 λŒ€ν™”ν˜• μ—μ΄μ „νŠΈμ˜ κΈ°λŠ₯을 ν™•μž₯ν•˜λŠ” κ°•λ ₯ν•œ λ°©λ²•μž…λ‹ˆλ‹€.

도ꡬλ₯Ό λ§Œλ“€ λ•ŒλŠ” μ•„λž˜ κ·œμΉ™μ„ λ”°λ₯΄μ„Έμš”.

  1. ν•¨μˆ˜λŠ” κΈ°λŠ₯을 잘 μ„€λͺ…ν•˜λŠ” 이름을 κ°€μ Έμ•Ό ν•©λ‹ˆλ‹€.
  2. ν•¨μˆ˜μ˜ μΈμˆ˜λŠ” ν•¨μˆ˜ 헀더에 νƒ€μž… 힌트λ₯Ό 포함해야 ν•©λ‹ˆλ‹€(Args λΈ”λ‘μ—λŠ” ν¬ν•¨ν•˜μ§€ λ§ˆμ„Έμš”).
  3. ν•¨μˆ˜μ—λŠ” Google μŠ€νƒ€μΌ 의 λ…μŠ€νŠΈλ§(docstring)이 ν¬ν•¨λ˜μ–΄μ•Ό ν•©λ‹ˆλ‹€.
  4. ν•¨μˆ˜μ— λ°˜ν™˜ νƒ€μž…κ³Ό Returns 블둝을 포함할 수 μžˆμ§€λ§Œ, 도ꡬλ₯Ό ν™œμš©ν•˜λŠ” λŒ€λΆ€λΆ„μ˜ λͺ¨λΈμ—μ„œ 이λ₯Ό μ‚¬μš©ν•˜μ§€ μ•ŠκΈ° λ•Œλ¬Έμ— λ¬΄μ‹œν•  수 μžˆμŠ΅λ‹ˆλ‹€.

μ£Όμ–΄μ§„ μœ„μΉ˜μ˜ ν˜„μž¬ μ˜¨λ„μ™€ 풍속을 κ°€μ Έμ˜€λŠ” λ„κ΅¬μ˜ μ˜ˆμ‹œλŠ” μ•„λž˜μ™€ κ°™μŠ΅λ‹ˆλ‹€.

def get_current_temperature(location: str, unit: str) -> float:
    """
    μ£Όμ–΄μ§„ μœ„μΉ˜μ˜ ν˜„μž¬ μ˜¨λ„λ₯Ό κ°€μ Έμ˜΅λ‹ˆλ‹€.
    
    Args:
        location: μ˜¨λ„λ₯Ό κ°€μ Έμ˜¬ μœ„μΉ˜, "λ„μ‹œ, κ΅­κ°€" ν˜•μ‹
        unit: μ˜¨λ„λ₯Ό λ°˜ν™˜ν•  λ‹¨μœ„. (선택지: ["celsius(섭씨)", "fahrenheit(화씨)"])
    Returns:
        μ£Όμ–΄μ§„ μœ„μΉ˜μ˜ μ§€μ •λœ λ‹¨μœ„λ‘œ ν‘œμ‹œλœ ν˜„μž¬ μ˜¨λ„(float μžλ£Œν˜•).
    """
    return 22.  # μ‹€μ œ ν•¨μˆ˜λΌλ©΄ μ•„λ§ˆ μ§„μ§œλ‘œ κΈ°μ˜¨μ„ 가져와야겠죠!

def get_current_wind_speed(location: str) -> float:
    """
    μ£Όμ–΄μ§„ μœ„μΉ˜μ˜ ν˜„μž¬ 풍속을 km/h λ‹¨μœ„λ‘œ κ°€μ Έμ˜΅λ‹ˆλ‹€.
    
    Args:
        location: μ˜¨λ„λ₯Ό κ°€μ Έμ˜¬ μœ„μΉ˜, "λ„μ‹œ, κ΅­κ°€" ν˜•μ‹
    Returns:
        μ£Όμ–΄μ§„ μœ„μΉ˜μ˜ ν˜„μž¬ 풍속(km/h, float μžλ£Œν˜•).
    """
    return 6.  # μ‹€μ œ ν•¨μˆ˜λΌλ©΄ μ•„λ§ˆ μ§„μ§œλ‘œ 풍속을 가져와야겠죠!

tools = [get_current_temperature, get_current_wind_speed]

NousResearch/Hermes-2-Pro-Llama-3-8B와 같이 도ꡬ μ‚¬μš©μ„ μ§€μ›ν•˜λŠ” λͺ¨λΈκ³Ό ν† ν¬λ‚˜μ΄μ €λ₯Ό κ°€μ Έμ˜€μ„Έμš”. ν•˜λ“œμ›¨μ–΄κ°€ μ§€μ›λœλ‹€λ©΄ Command-Rμ΄λ‚˜ Mixtral-8x22B와 같은 더 큰 λͺ¨λΈλ„ κ³ λ €ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B")
tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B", torch_dtype=torch.bfloat16, device_map="auto")

μ±„νŒ… λ©”μ‹œμ§€λ₯Ό μƒμ„±ν•©λ‹ˆλ‹€.

messages = [
  {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]

messages와 도ꡬ λͺ©λ‘ toolsλ₯Ό [~PreTrainedTokenizerBase.apply_chat_template]에 μ „λ‹¬ν•œ λ’€, 이λ₯Ό λͺ¨λΈμ˜ μž…λ ₯으둜 μ‚¬μš©ν•˜μ—¬ ν…μŠ€νŠΈλ₯Ό 생성할 수 μžˆμŠ΅λ‹ˆλ‹€.

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):]))
<tool_call>
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
</tool_call><|im_end|>

μ±„νŒ… λͺ¨λΈμ€ λ…μŠ€νŠΈλ§(docstring)에 μ •μ˜λœ ν˜•μ‹μ— 따라 get_current_temperature ν•¨μˆ˜μ— μ˜¬λ°”λ₯Έ λ§€κ°œλ³€μˆ˜λ₯Ό 전달해 ν˜ΈμΆœν–ˆμŠ΅λ‹ˆλ‹€. 파리λ₯Ό κΈ°μ€€μœΌλ‘œ μœ„μΉ˜λ₯Ό ν”„λž‘μŠ€λ‘œ μΆ”λ‘ ν–ˆμœΌλ©°, μ˜¨λ„ λ‹¨μœ„λŠ” 섭씨λ₯Ό μ‚¬μš©ν•΄μ•Ό ν•œλ‹€κ³  νŒλ‹¨ν–ˆμŠ΅λ‹ˆλ‹€.

이제 get_current_temperature ν•¨μˆ˜μ™€ ν•΄λ‹Ή μΈμˆ˜λ“€μ„ tool_call λ”•μ…”λ„ˆλ¦¬μ— λ‹΄μ•„ μ±„νŒ… λ©”μ‹œμ§€μ— μΆ”κ°€ν•©λ‹ˆλ‹€. tool_call λ”•μ…”λ„ˆλ¦¬λŠ” systemμ΄λ‚˜ userκ°€ μ•„λ‹Œ assistant μ—­ν• λ‘œ μ œκ³΅λ˜μ–΄μ•Ό ν•©λ‹ˆλ‹€.

OpenAI APIλŠ” tool_call ν˜•μ‹μœΌλ‘œ JSON λ¬Έμžμ—΄μ„ μ‚¬μš©ν•©λ‹ˆλ‹€. Transformersμ—μ„œ μ‚¬μš©ν•  경우 λ”•μ…”λ„ˆλ¦¬λ₯Ό μš”κ΅¬ν•˜κΈ° λ•Œλ¬Έμ—, 였λ₯˜κ°€ λ°œμƒν•˜κ±°λ‚˜ λͺ¨λΈμ΄ μ΄μƒν•˜κ²Œ λ™μž‘ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

μ–΄μ‹œμŠ€ν„΄νŠΈκ°€ ν•¨μˆ˜ 좜λ ₯을 읽고 μ‚¬μš©μžμ™€ μ±„νŒ…ν•  수 μžˆλ„λ‘ ν•©λ‹ˆλ‹€.

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
The temperature in Paris, France right now is approximately 12Β°C (53.6Β°F).<|im_end|>

Mistral 및 Mixtral λͺ¨λΈμ˜ 경우 μΆ”κ°€μ μœΌλ‘œ tool_call_idκ°€ ν•„μš”ν•©λ‹ˆλ‹€. tool_call_idλŠ” 9자리 영숫자 λ¬Έμžμ—΄λ‘œ μƒμ„±λ˜μ–΄ tool_call λ”•μ…”λ„ˆλ¦¬μ˜ id 킀에 ν• λ‹Ήλ©λ‹ˆλ‹€.

tool_call_id = "9Ae3bDc2F"
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "id": tool_call_id, "function": tool_call}]})
inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))

μŠ€ν‚€λ§ˆ[[Schema]]

[~PreTrainedTokenizerBase.apply_chat_template]은 ν•¨μˆ˜λ₯Ό JSON μŠ€ν‚€λ§ˆλ‘œ λ³€ν™˜ν•˜μ—¬ μ±„νŒ… ν…œν”Œλ¦Ώμ— μ „λ‹¬ν•©λ‹ˆλ‹€. LLM은 ν•¨μˆ˜ λ‚΄λΆ€μ˜ μ½”λ“œλ₯Ό 보지 λͺ»ν•©λ‹ˆλ‹€. λ‹€μ‹œ 말해, LLM은 ν•¨μˆ˜κ°€ 기술적으둜 μ–΄λ–»κ²Œ μž‘λ™ν•˜λŠ”μ§€λŠ” μ‹ κ²½ μ“°μ§€ μ•Šκ³ , ν•¨μˆ˜μ˜ μ •μ˜μ™€ 인수만 μ°Έμ‘°ν•©λ‹ˆλ‹€.

ν•¨μˆ˜κ°€ μ•žμ„œ λ‚˜μ—΄λœ κ·œμΉ™μ„ λ”°λ₯΄λ©΄, λ‚΄λΆ€μ—μ„œ JSON μŠ€ν‚€λ§ˆκ°€ μžλ™μœΌλ‘œ μƒμ„±λ©λ‹ˆλ‹€. ν•˜μ§€λ§Œ 더 λ‚˜μ€ κ°€λ…μ„±μ΄λ‚˜ 디버깅을 μœ„ν•΄ get_json_schemaλ₯Ό μ‚¬μš©ν•˜μ—¬ μŠ€ν‚€λ§ˆλ₯Ό μˆ˜λ™μœΌλ‘œ λ³€ν™˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

from transformers.utils import get_json_schema

def multiply(a: float, b: float):
    """
    두 숫자λ₯Ό κ³±ν•˜λŠ” ν•¨μˆ˜
    
    Args:
        a: κ³±ν•  첫 번째 숫자
        b: κ³±ν•  두 번째 숫자
    """
    return a * b

schema = get_json_schema(multiply)
print(schema)
{
  "type": "function", 
  "function": {
    "name": "multiply", 
    "description": "A function that multiplies two numbers", 
    "parameters": {
      "type": "object", 
      "properties": {
        "a": {
          "type": "number", 
          "description": "The first number to multiply"
        }, 
        "b": {
          "type": "number",
          "description": "The second number to multiply"
        }
      }, 
      "required": ["a", "b"]
    }
  }
}

μŠ€ν‚€λ§ˆλ₯Ό νŽΈμ§‘ν•˜κ±°λ‚˜ μ²˜μŒλΆ€ν„° 직접 μž‘μ„±ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 이λ₯Ό 톡해 더 λ³΅μž‘ν•œ ν•¨μˆ˜μ— λŒ€ν•œ μ •ν™•ν•œ μŠ€ν‚€λ§ˆλ₯Ό μœ μ—°ν•˜κ²Œ μ •μ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

ν•¨μˆ˜ μ‹œκ·Έλ‹ˆμ²˜λ₯Ό λ‹¨μˆœν•˜κ²Œ μœ μ§€ν•˜κ³  인수λ₯Ό μ΅œμ†Œν•œμœΌλ‘œ μœ μ§€ν•˜μ„Έμš”. μ΄λŸ¬ν•œ ν•¨μˆ˜λŠ” μ€‘μ²©λœ 인수λ₯Ό κ°€μ§„ λ³΅μž‘ν•œ ν•¨μˆ˜μ— λΉ„ν•΄ λͺ¨λΈμ΄ 더 μ‰½κ²Œ μ΄ν•΄ν•˜κ³  μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

μ•„λž˜ μ˜ˆμ‹œλŠ” μŠ€ν‚€λ§ˆλ₯Ό μˆ˜λ™μœΌλ‘œ μž‘μ„±ν•œ λ‹€μŒ [~PreTrainedTokenizerBase.apply_chat_template]에 μ „λ‹¬ν•˜λŠ” 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€.

# 인수λ₯Ό λ°›μ§€ μ•ŠλŠ” κ°„λ‹¨ν•œ ν•¨μˆ˜
current_time = {
  "type": "function", 
  "function": {
    "name": "current_time",
    "description": "Get the current local time as a string.",
    "parameters": {
      'type': 'object',
      'properties': {}
    }
  }
}

# 두 개의 숫자 인수λ₯Ό λ°›λŠ” 더 μ™„μ „ν•œ ν•¨μˆ˜
multiply = {
  'type': 'function',
  'function': {
    'name': 'multiply',
    'description': 'A function that multiplies two numbers', 
    'parameters': {
      'type': 'object', 
      'properties': {
        'a': {
          'type': 'number',
          'description': 'The first number to multiply'
        }, 
        'b': {
          'type': 'number', 'description': 'The second number to multiply'
        }
      }, 
      'required': ['a', 'b']
    }
  }
}

model_input = tokenizer.apply_chat_template(
    messages,
    tools = [current_time, multiply]
)

RAG[[RAG]]

검색 증강 생성(Retrieval-augmented generation, RAG) λͺ¨λΈμ€ 쿼리λ₯Ό λ°˜ν™˜ν•˜κΈ° 전에 λ¬Έμ„œλ₯Ό 검색해 μΆ”κ°€ 정보λ₯Ό μ–»μ–΄ λͺ¨λΈμ΄ 기쑴에 κ°€μ§€κ³  있던 지식을 ν™•μž₯μ‹œν‚΅λ‹ˆλ‹€. RAG λͺ¨λΈμ˜ 경우, [~PreTrainedTokenizerBase.apply_chat_template]에 documents λ§€κ°œλ³€μˆ˜λ₯Ό μΆ”κ°€ν•˜μ„Έμš”. 이 documents λ§€κ°œλ³€μˆ˜λŠ” λ¬Έμ„œ λͺ©λ‘μ΄μ–΄μ•Ό ν•˜λ©°, 각 λ¬Έμ„œλŠ” titleκ³Ό content ν‚€λ₯Ό κ°€μ§„ 단일 λ”•μ…”λ„ˆλ¦¬μ—¬μ•Ό ν•©λ‹ˆλ‹€.

RAGλ₯Ό μœ„ν•œ documents λ§€κ°œλ³€μˆ˜λŠ” ν­λ„“κ²Œ μ§€μ›λ˜μ§€ μ•ŠμœΌλ©° λ§Žμ€ λͺ¨λΈλ“€μ΄ documentsλ₯Ό λ¬΄μ‹œν•˜λŠ” μ±„νŒ… ν…œν”Œλ¦Ώμ„ κ°€μ§€κ³  μžˆμŠ΅λ‹ˆλ‹€. λͺ¨λΈμ΄ documentsλ₯Ό μ§€μ›ν•˜λŠ”μ§€ ν™•μΈν•˜λ €λ©΄ λͺ¨λΈ μΉ΄λ“œλ₯Ό μ½κ±°λ‚˜ print(tokenizer.chat_template)λ₯Ό μ‹€ν–‰ν•˜μ—¬ documents ν‚€κ°€ μžˆλŠ”μ§€ ν™•μΈν•˜μ„Έμš”. Command-Rκ³Ό Command-R+λŠ” λͺ¨λ‘ RAG μ±„νŒ… ν…œν”Œλ¦Ώμ—μ„œ documentsλ₯Ό μ§€μ›ν•©λ‹ˆλ‹€.

λͺ¨λΈμ— 전달할 λ¬Έμ„œ λͺ©λ‘μ„ μƒμ„±ν•˜μ„Έμš”.

documents = [
    {
        "title": "The Moon: Our Age-Old Foe", 
        "text": "Man has always dreamed of destroying the moon. In this essay, I shall..."
    },
    {
        "title": "The Sun: Our Age-Old Friend",
        "text": "Although often underappreciated, the sun provides several notable benefits..."
    }
]

[~PreTrainedTokenizerBase.apply_chat_template]μ—μ„œ chat_template="rag"λ₯Ό μ„€μ •ν•˜κ³  응닡을 μƒμ„±ν•˜μ„Έμš”.

from transformers import AutoTokenizer, AutoModelForCausalLM

# λͺ¨λΈκ³Ό ν† ν¬λ‚˜μ΄μ € λ‘œλ“œ
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit")
model = AutoModelForCausalLM.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit", device_map="auto")
device = model.device # λͺ¨λΈμ„ κ°€μ Έμ˜¨ μž₯치 확인

# λŒ€ν™” μž…λ ₯ μ •μ˜
conversation = [
    {"role": "user", "content": "What has Man always dreamed of?"}
]

input_ids = tokenizer.apply_chat_template(
    conversation=conversation,
    documents=documents,
    chat_template="rag",
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt").to(device)

# 응닡 생성
generated_tokens = model.generate(
    input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.3,
    )

# μƒμ„±λœ ν…μŠ€νŠΈλ₯Ό λ””μ½”λ”©ν•˜κ³  생성 ν”„λ‘¬ν”„νŠΈμ™€ ν•¨κ»˜ 좜λ ₯
generated_text = tokenizer.decode(generated_tokens[0])
print(generated_text)