AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified
PyTorch TensorFlow FlashAttention SDPA

GPT-2[[gpt-2]]

GPT-2๋Š” GPT์˜ ํ™•์žฅ ๋ฒ„์ „์œผ๋กœ, ์ธ๊ณผ์  ํŠธ๋žœ์Šคํฌ๋จธ ์–ธ์–ด ๋ชจ๋ธ์ด๋ฉฐ, 10๋ฐฐ ๋” ๋งŽ์€ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ด์ „์˜ ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋„๋ก 40GB ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์‚ฌ์ „ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์„ ํ†ตํ•ด ์ด ๋ชจ๋ธ์€ ์ œ๋กœ์ƒท ์„ค์ •์—์„œ ๋งŽ์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ฐ ํ† ํฐ์ด ์ด์ „ ํ† ํฐ์—๋งŒ ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์ผ ์ˆ˜ ์žˆ๋Š” ๋‹จ๋ฐฉํ–ฅ(์ธ๊ณผ์ ) ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ํ…์ŠคํŠธ ์ƒ์„ฑ ์ž‘์—…์— ํŠนํžˆ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.

๋ชจ๋“  ์›๋ณธ GPT-2 ์ฒดํฌํฌ์ธํŠธ๋Š” OpenAI community ์กฐ์ง์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ค๋ฅธ์ชฝ ์‚ฌ์ด๋“œ๋ฐ”์˜ GPT-2 ๋ชจ๋ธ์„ ํด๋ฆญํ•˜์—ฌ GPT-2๋ฅผ ๋‹ค์–‘ํ•œ ์–ธ์–ด ์ž‘์—…์— ์ ์šฉํ•˜๋Š” ๋” ๋งŽ์€ ์˜ˆ์‹œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” [Pipeline] ๋˜๋Š” [AutoModel], ๊ทธ๋ฆฌ๊ณ  ๋ช…๋ น์ค„์—์„œ GPT-2๋กœ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

import torch
from transformers import pipeline

# ํ…์ŠคํŠธ ์ƒ์„ฑ์„ ์œ„ํ•œ ํŒŒ์ดํ”„๋ผ์ธ ์ƒ์„ฑ
pipeline = pipeline(task="text-generation", model="openai-community/gpt2", dtype=torch.float16, device=0)
pipeline("Hello, I'm a language model")
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2", dtype=torch.float16, device_map="auto", attn_implementation="sdpa")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")

# ์ž…๋ ฅ ํ…์ŠคํŠธ๋ฅผ ํ† ํฐํ™”ํ•˜๊ณ  GPU๋กœ ์ด๋™
input_ids = tokenizer("Hello, I'm a language model", return_tensors="pt").to("cuda")

# ํ…์ŠคํŠธ ์ƒ์„ฑ
output = model.generate(**input_ids, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))
echo -e "Hello, I'm a language model" | transformers run --task text-generation --model openai-community/gpt2 --device 0

transformers backend๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ vLLM์œผ๋กœ ๋ชจ๋ธ์„ ์„œ๋น™ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

vllm serve openai-community/gpt2 --model-imp transformers

์–‘์žํ™”๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋Œ€ํ˜• ๋ชจ๋ธ์˜ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€๋‹ด์„ ์ค„์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋” ๋งŽ์€ ์–‘์žํ™” ๋ฐฑ์—”๋“œ์— ๋Œ€ํ•ด์„œ๋Š” Quantization ๊ฐœ์š”๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” bitsandbytes๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋งŒ 4๋น„ํŠธ๋กœ ์–‘์žํ™”ํ•ฉ๋‹ˆ๋‹ค.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline

# ์–‘์žํ™” ์„ค์ • ๊ตฌ์„ฑ
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True
)

# ์–‘์žํ™”๋œ ๋ชจ๋ธ ๋กœ๋“œ
model = AutoModelForCausalLM.from_pretrained(
    "openai-community/gpt2-xl",
    quantization_config=quantization_config,
    device_map="auto"
)

# ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ ๋ฐ ํ…์ŠคํŠธ ์ƒ์„ฑ
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2-xl")
inputs = tokenizer("Once upon a time, there was a magical forest", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

์ฐธ๊ณ ์‚ฌํ•ญ[[notes]]

  • GPT-2๋Š” ์ ˆ๋Œ€ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ์ž…๋ ฅ์„ ์˜ค๋ฅธ์ชฝ์— ํŒจ๋”ฉํ•˜์„ธ์š”.
  • GPT-2๋Š” ์ด์ „์— ๊ณ„์‚ฐ๋œ ํ‚ค-๊ฐ’ ์–ดํ…์…˜ ์Œ์„ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [GPT2Model.forward]์˜ past_key_values ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์ด ๊ธฐ๋Šฅ์— ์ ‘๊ทผํ•˜์„ธ์š”.
  • Mistral์˜ ํ•™์Šต ์•ˆ์ •์„ฑ ๊ฐœ์„  ์‚ฌํ•ญ์„ ์ ์šฉํ•˜๋ ค๋ฉด scale_attn_by_inverse_layer_idx์™€ reorder_and_upcast_attn ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํ™œ์„ฑํ™”ํ•˜์„ธ์š”.

GPT2Config

[[autodoc]] GPT2Config

GPT2Tokenizer

[[autodoc]] GPT2Tokenizer - save_vocabulary

GPT2TokenizerFast

[[autodoc]] GPT2TokenizerFast

GPT2 ํŠน์ • ์ถœ๋ ฅ[[gpt2-specific-outputs]]

[[autodoc]] models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput

GPT2Model

[[autodoc]] GPT2Model - forward

GPT2LMHeadModel

[[autodoc]] GPT2LMHeadModel - forward

GPT2DoubleHeadsModel

[[autodoc]] GPT2DoubleHeadsModel - forward

GPT2ForQuestionAnswering

[[autodoc]] GPT2ForQuestionAnswering - forward

GPT2ForSequenceClassification

[[autodoc]] GPT2ForSequenceClassification - forward

GPT2ForTokenClassification

[[autodoc]] GPT2ForTokenClassification - forward