AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified

์ด ๋ชจ๋ธ์€ 2023๋…„ 8์›” 24์ผ์— ๊ณต๊ฐœ๋˜์—ˆ์œผ๋ฉฐ, 2023๋…„ 8์›” 25์ผ์— Hugging Face Transformers์— ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

PyTorch ">

CodeLlama[[codellama]]

Code Llama๋Š” ์ฝ”๋”ฉ ์ž‘์—…์— ํŠนํ™”๋œ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ ๊ณ„์—ด๋กœ, Llama 2๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์ฝ”๋“œ, Python ํŠนํ™”, ๋ช…๋ น์–ด(์ง€์‹œ) ๊ธฐ๋ฐ˜ ๋ณ€ํ˜• ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฒ„์ „์œผ๋กœ ์ œ๊ณต๋˜๋ฉฐ, ๋ชจ๋‘ 7B, 13B, 34B, 70B ๋งค๊ฐœ๋ณ€์ˆ˜ ํฌ๊ธฐ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Code Llama ๋ชจ๋ธ์€ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์„ค๋ช…ํ•˜๋ฉฐ, ์ฝ”๋“œ์˜ ๋ˆ„๋ฝ๋œ ๋ถ€๋ถ„์„ ์ฑ„์šธ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ธํ•„๋ง(infilling)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. 16K ํ† ํฐ ๊ธธ์ด๋กœ ํ›ˆ๋ จ๋˜์—ˆ์ง€๋งŒ, ์ตœ๋Œ€ 100K ํ† ํฐ๊นŒ์ง€ ์•ˆ์ •์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋ฉฐ ๊ธด ์ปจํ…์ŠคํŠธ๋„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Code Llama ์ปฌ๋ ‰์…˜์—์„œ ๋ชจ๋“  ์›๋ณธ Code Llama ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์–‘ํ•œ ์ฝ”๋”ฉ ์ž‘์—…์— Code Llama๋ฅผ ์ ์šฉํ•˜๋Š” ๋” ๋งŽ์€ ์˜ˆ์‹œ๋ฅผ ๋ณด๋ ค๋ฉด ์˜ค๋ฅธ์ชฝ ์‚ฌ์ด๋“œ๋ฐ”์˜ Code Llama ๋ชจ๋ธ์„ ํด๋ฆญํ•˜์„ธ์š”.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” [Pipeline], [AutoModel], ๊ทธ๋ฆฌ๊ณ  ๋ช…๋ น์ค„์—์„œ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="meta-llama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map=0
)

# ๊ธฐ๋ณธ ์ฝ”๋“œ ์ƒ์„ฑ
result = pipe("# Function to calculate the factorial of a number\ndef factorial(n):", max_new_tokens=256)
print(result[0]['generated_text'])

# ์ธํ•„๋ง
infill_result = pipe("def remove_non_ascii(s: str) -> str:\n    \"\"\" <FILL_ME>\n    return result", max_new_tokens=200)
print(infill_result[0]['generated_text'])
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/CodeLlama-7b-hf")
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="sdpa"
)

# ๊ธฐ๋ณธ ์ฝ”๋“œ ์ƒ์„ฑ
prompt = "# Function to calculate the factorial of a number\ndef factorial(n):"
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")

output = model.generate(
    **input_ids,
    max_new_tokens=256,
    cache_implementation="static"
)
print(tokenizer.decode(output[0], skip_special_tokens=True))

# ์ธํ•„๋ง
infill_prompt = "def remove_non_ascii(s: str) -> str:\n    \"\"\" <FILL_ME>\n    return result"
input_ids = tokenizer(infill_prompt, return_tensors="pt").to(model.device)

filled_output = model.generate(**input_ids, max_new_tokens=200)
filled_text = tokenizer.decode(filled_output[0], skip_special_tokens=True)
print(filled_text)
echo -e "# Function to calculate the factorial of a number\ndef factorial(n):" | transformers run --task text-generation --model meta-llama/CodeLlama-7b-hf --device 0

์–‘์žํ™”๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์˜ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€๋‹ด์„ ์ค„์ž…๋‹ˆ๋‹ค. ๋” ๋งŽ์€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์–‘์žํ™” ๋ฐฑ์—”๋“œ๋Š” ์–‘์žํ™” ๊ฐœ์š”๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” bitsandbytes๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ 4๋น„ํŠธ๋กœ๋งŒ ์–‘์žํ™”ํ•ฉ๋‹ˆ๋‹ค.

# bitsandbytes๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
import torch
from transformers import AutoModelForCausalLM, CodeLlamaTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True)
tokenizer = CodeLlamaTokenizer.from_pretrained("meta-llama/CodeLlama-34b-hf")
model = AutoModelForCausalLM.from_pretrained(
   "meta-llama/CodeLlama-34b-hf",
   torch_dtype=torch.bfloat16,
   device_map="auto",
   quantization_config=bnb_config
)

prompt = "# Write a Python function to check if a string is a palindrome\ndef is_palindrome(s):"
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")

output = model.generate(**input_ids, max_new_tokens=200, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))

AttentionMaskVisualizer๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋ธ์ด ์–ด๋–ค ํ† ํฐ์— ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์ผ ์ˆ˜ ์žˆ๊ณ  ๊ธฐ์šธ์ผ ์ˆ˜ ์—†๋Š”์ง€๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from transformers.utils.attention_visualizer import AttentionMaskVisualizer

visualizer = AttentionMaskVisualizer("meta-llama/CodeLlama-7b-hf")
visualizer("""def func(a, b):
  return a + b""")

์ฐธ๊ณ ์‚ฌํ•ญ[[notes]]

  • ์ธํ•„๋ง ๊ธฐ๋Šฅ์€ 7B ๋ฐ 13B ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์—์„œ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, Python, Instruct, 34B ๋˜๋Š” 70B ๋ชจ๋ธ์—์„œ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
  • ์ฝ”๋“œ๋ฅผ ์ฑ„์›Œ ๋„ฃ๊ณ  ์‹ถ์€ ๋ถ€๋ถ„์— <FILL_ME> ํ† ํฐ์„ ์‚ฌ์šฉํ•˜์„ธ์š”. ํ† ํฌ๋‚˜์ด์ €๋Š” ์ด ํ† ํฐ์„ ๋ถ„ํ• ํ•˜์—ฌ ์›๋ณธ ํ›ˆ๋ จ ํŒจํ„ด ์„ ๋”ฐ๋ฅด๋Š” ์ž…๋ ฅ ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ง์ ‘ ํŒจํ„ด์„ ์ค€๋น„ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ์•ˆ์ •์ ์ž…๋‹ˆ๋‹ค.
    from transformers import LlamaForCausalLM, CodeLlamaTokenizer
    
    tokenizer = CodeLlamaTokenizer.from_pretrained("meta-llama/CodeLlama-7b-hf")
    model = LlamaForCausalLM.from_pretrained("meta-llama/CodeLlama-7b-hf")
    PROMPT = '''def remove_non_ascii(s: str) -> str:
        """ <FILL_ME>
        return result
    '''
    input_ids = tokenizer(PROMPT, return_tensors="pt")["input_ids"]
    generated_ids = model.generate(input_ids, max_new_tokens=128)
    
    filling = tokenizer.batch_decode(generated_ids[:, input_ids.shape[1]:], skip_special_tokens = True)[0]
    print(PROMPT.replace("<FILL_ME>", filling))
    
  • ์ถ”๊ฐ€ ํ›ˆ๋ จ์ด๋‚˜ ๋ฏธ์„ธ ์กฐ์ •์—๋Š” bfloat16์„ ์‚ฌ์šฉํ•˜๊ณ  ์ถ”๋ก ์—๋Š” float16์„ ์‚ฌ์šฉํ•˜์„ธ์š”.
  • BOS ๋ฌธ์ž๋Š” ์ ‘๋‘์‚ฌ๋‚˜ ์ ‘๋ฏธ์‚ฌ๋ฅผ ์ธ์ฝ”๋”ฉํ•  ๋•Œ ์ธํ•„๋ง ์ž‘์—…์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฉฐ, ๊ฐ ํ”„๋กฌํ”„ํŠธ์˜ ๋งจ ์•ž์—์„œ๋งŒ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ํ† ํฌ๋‚˜์ด์ €๋Š” SentencePiece๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” byte-pair ์ธ์ฝ”๋”ฉ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋””์ฝ”๋”ฉ ๊ณผ์ •์—์„œ ์ฒซ ๋ฒˆ์งธ ํ† ํฐ์ด ๋‹จ์–ด์˜ ์‹œ์ž‘์ธ ๊ฒฝ์šฐ(์˜ˆ๋ฅผ ๋“ค์–ด "Banana"), ํ† ํฌ๋‚˜์ด์ €๋Š” ๋ฌธ์ž์—ด์— ์ ‘๋‘์‚ฌ ๊ณต๋ฐฑ์„ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

CodeLlamaTokenizer

[[autodoc]] CodeLlamaTokenizer - get_special_tokens_mask - save_vocabulary

CodeLlamaTokenizerFast

[[autodoc]] CodeLlamaTokenizerFast - get_special_tokens_mask - update_post_processor - save_vocabulary