AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified
PyTorch TensorFlow Flax SDPA

ALBERT[[albert]]

ALBERT๋Š” BERT์˜ ํ™•์žฅ์„ฑ๊ณผ ํ•™์Šต ์‹œ ๋ฉ”๋ชจ๋ฆฌ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๋‘ ๊ฐ€์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ์†Œ ๊ธฐ๋ฒ•์„ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ ๋ถ„ํ•ด(factorized embedding parametrization)๋กœ, ํฐ ์–ดํœ˜ ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ์„ ๋‘ ๊ฐœ์˜ ์ž‘์€ ํ–‰๋ ฌ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ํžˆ๋“  ์‚ฌ์ด์ฆˆ๋ฅผ ๋Š˜๋ ค๋„ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ๊ณ„์ธต ๊ฐ„ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ (cross-layer parameter sharing)๋กœ, ์—ฌ๋Ÿฌ ๊ณ„์ธต์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜์—ฌ ํ•™์Šตํ•ด์•ผ ํ•  ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค.

ALBERT๋Š” BERT์—์„œ ๋ฐœ์ƒํ•˜๋Š” GPU/TPU ๋ฉ”๋ชจ๋ฆฌ ํ•œ๊ณ„, ๊ธด ํ•™์Šต ์‹œ๊ฐ„, ๊ฐ‘์ž‘์Šค๋Ÿฐ ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค. ALBERT๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๊ณ  BERT์˜ ํ•™์Šต ์†๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค:

  • ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ ๋ถ„ํ•ด: ํฐ ์–ดํœ˜ ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ์„ ๋‘ ๊ฐœ์˜ ๋” ์ž‘์€ ํ–‰๋ ฌ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ž…๋‹ˆ๋‹ค.
  • ๊ณ„์ธต ๊ฐ„ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ : ๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ ๊ณ„์ธต๋งˆ๋‹ค ๋ณ„๋„์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹ , ์—ฌ๋Ÿฌ ๊ณ„์ธต์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜์—ฌ ํ•™์Šตํ•ด์•ผ ํ•  ๊ฐ€์ค‘์น˜ ์ˆ˜๋ฅผ ๋”์šฑ ์ค„์ž…๋‹ˆ๋‹ค.

ALBERT๋Š” BERT์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ ˆ๋Œ€ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ(absolute position embeddings)์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ์ž…๋ ฅ ํŒจ๋”ฉ์€ ์˜ค๋ฅธ์ชฝ์— ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ž„๋ฒ ๋”ฉ ํฌ๊ธฐ๋Š” 128์ด๋ฉฐ, BERT์˜ 768๋ณด๋‹ค ์ž‘์Šต๋‹ˆ๋‹ค. ALBERT๋Š” ํ•œ ๋ฒˆ์— ์ตœ๋Œ€ 512๊ฐœ์˜ ํ† ํฐ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  ๊ณต์‹ ALBERT ์ฒดํฌํฌ์ธํŠธ๋Š” ALBERT ์ปค๋ฎค๋‹ˆํ‹ฐ ์กฐ์ง์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ค๋ฅธ์ชฝ ์‚ฌ์ด๋“œ๋ฐ”์˜ ALBERT ๋ชจ๋ธ์„ ํด๋ฆญํ•˜์‹œ๋ฉด ๋‹ค์–‘ํ•œ ์–ธ์–ด ์ž‘์—…์— ALBERT๋ฅผ ์ ์šฉํ•˜๋Š” ์˜ˆ์‹œ๋ฅผ ๋” ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜ ์˜ˆ์‹œ๋Š” [Pipeline], [AutoModel] ๊ทธ๋ฆฌ๊ณ  ์ปค๋งจ๋“œ๋ผ์ธ์—์„œ [MASK] ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

import torch
from transformers import pipeline

pipeline = pipeline(
    task="fill-mask",
    model="albert-base-v2",
    dtype=torch.float16,
    device=0
)
pipeline("์‹๋ฌผ์€ ๊ด‘ํ•ฉ์„ฑ์ด๋ผ๊ณ  ์•Œ๋ ค์ง„ ๊ณผ์ •์„ ํ†ตํ•ด [MASK]๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.", top_k=5)
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
model = AutoModelForMaskedLM.from_pretrained(
    "albert/albert-base-v2",
    dtype=torch.float16,
    attn_implementation="sdpa",
    device_map="auto"
)

prompt = "์‹๋ฌผ์€ [MASK]์ด๋ผ๊ณ  ์•Œ๋ ค์ง„ ๊ณผ์ •์„ ํ†ตํ•ด ์—๋„ˆ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model(**inputs)
    mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
    predictions = outputs.logits[0, mask_token_index]

top_k = torch.topk(predictions, k=5).indices.tolist()
for token_id in top_k[0]:
    print(f"์˜ˆ์ธก: {tokenizer.decode([token_id])}")
echo -e "Plants create [MASK] through a process known as photosynthesis." | transformers run --task fill-mask --model albert-base-v2 --device 0

์ฐธ๊ณ  ์‚ฌํ•ญ[[notes]]

  • BERT๋Š” ์ ˆ๋Œ€ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ์˜ค๋ฅธ์ชฝ์— ์ž…๋ ฅ์ด ํŒจ๋”ฉ๋ผ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ž„๋ฒ ๋”ฉ ํฌ๊ธฐ E๋Š” ํžˆ๋“  ํฌ๊ธฐ H์™€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ž„๋ฒ ๋”ฉ์€ ๋ฌธ๋งฅ์— ๋…๋ฆฝ์ (๊ฐ ํ† ํฐ๋งˆ๋‹ค ํ•˜๋‚˜์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ)์ด๊ณ , ์€๋‹‰ ์ƒํƒœ๋Š” ๋ฌธ๋งฅ์— ์˜์กด์ (ํ† ํฐ ์‹œํ€€์Šค๋งˆ๋‹ค ํ•˜๋‚˜์˜ ์€๋‹‰ ์ƒํƒœ)์ž…๋‹ˆ๋‹ค. ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ์€ V x E(V: ์–ดํœ˜ ํฌ๊ธฐ)์ด๋ฏ€๋กœ, ์ผ๋ฐ˜์ ์œผ๋กœ H >> E๊ฐ€ ๋” ๋…ผ๋ฆฌ์ ์ž…๋‹ˆ๋‹ค. E < H์ผ ๋•Œ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋” ์ ์–ด์ง‘๋‹ˆ๋‹ค.

์ฐธ๊ณ  ์ž๋ฃŒ[[resources]]

์•„๋ž˜ ์„น์…˜์˜ ์ž๋ฃŒ๋“ค์€ ๊ณต์‹ Hugging Face ๋ฐ ์ปค๋ฎค๋‹ˆํ‹ฐ(๐ŸŒŽ ํ‘œ์‹œ) ์ž๋ฃŒ๋กœ, AlBERT๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ์ถ”๊ฐ€ํ•  ์ž๋ฃŒ๊ฐ€ ์žˆ๋‹ค๋ฉด Pull Request๋ฅผ ๋ณด๋‚ด์ฃผ์„ธ์š”! ๊ธฐ์กด ์ž๋ฃŒ์™€ ์ค‘๋ณต๋˜์ง€ ์•Š๊ณ  ์ƒˆ๋กœ์šด ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์œผ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.

๋‹ค์ค‘ ์„ ํƒ(Multiple choice)

AlbertConfig[[albertconfig]]

[[autodoc]] AlbertConfig

AlbertTokenizer[[alberttokenizer]]

[[autodoc]] AlbertTokenizer - get_special_tokens_mask - save_vocabulary

AlbertTokenizerFast[[alberttokenizerfast]]

[[autodoc]] AlbertTokenizerFast

Albert ํŠนํ™” ์ถœ๋ ฅ[[albert-specific-outputs]]

[[autodoc]] models.albert.modeling_albert.AlbertForPreTrainingOutput

AlbertModel[[albertmodel]]

[[autodoc]] AlbertModel - forward

AlbertForPreTraining[[albertforpretraining]]

[[autodoc]] AlbertForPreTraining - forward

AlbertForMaskedLM[[albertformaskedlm]]

[[autodoc]] AlbertForMaskedLM - forward

AlbertForSequenceClassification[[albertforsequenceclassification]]

[[autodoc]] AlbertForSequenceClassification - forward

AlbertForMultipleChoice[[albertformultiplechoice]]

[[autodoc]] AlbertForMultipleChoice

AlbertForTokenClassification[[albertfortokenclassification]]

[[autodoc]] AlbertForTokenClassification - forward

AlbertForQuestionAnswering[[albertforquestionanswering]]

[[autodoc]] AlbertForQuestionAnswering - forward