Transformer / README.md
JangTaeng's picture
Upload 4 files
0465ac4 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Transformer Demo
emoji: ๐Ÿค–
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit

Transformer โ€” ๋…ผ๋ฌธ ์žฌํ˜„ ๋ฐ๋ชจ

๋…ผ๋ฌธ: Attention Is All You Need (Vaswani et al., NIPS 2017)

RNN๊ณผ CNN์„ ๋ชจ๋‘ ๋ฒ„๋ฆฌ๊ณ  ์˜ค์ง attention๋งŒ์œผ๋กœ ์ธ์ฝ”๋”-๋””์ฝ”๋”๋ฅผ ๊ตฌ์„ฑํ•œ Transformer ๋…ผ๋ฌธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ์žฌํ˜„ํ•˜๊ณ , ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ง์ ‘ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” Space์ž…๋‹ˆ๋‹ค.


๋ฌด์—‡์„ ํ•  ์ˆ˜ ์žˆ๋‚˜์š”?

์ˆซ์ž ์‹œํ€€์Šค๋ฅผ ์ž…๋ ฅํ•˜๋ฉด Transformer๊ฐ€ ๋’ค์ง‘์–ด ์ค๋‹ˆ๋‹ค.

์ž…๋ ฅ :  1 2 3 4 5
์ถœ๋ ฅ :  5 4 3 2 1

๊ทธ๋ฆฌ๊ณ  ๋” ํฅ๋ฏธ๋กœ์šด ๊ฑด โ€” ๋””์ฝ”๋”์˜ cross-attention ๊ฐ€์ค‘์น˜๋ฅผ ์‹œ๊ฐํ™”ํ•ด์„œ ๋ชจ๋ธ์ด "์ถœ๋ ฅ i๋ฒˆ์งธ ์œ„์น˜๋ฅผ ๋งŒ๋“ค ๋•Œ ์ž…๋ ฅ ์–ด๋””๋ฅผ ๋ดค๋Š”์ง€"๋ฅผ ์ง์ ‘ ๋ณผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฑฐ์˜ˆ์š”. ๋’ค์ง‘๊ธฐ ํƒœ์Šคํฌ์—์„œ๋Š” ๋ฐ˜๋Œ€๊ฐ์„ (anti-diagonal) ํŒจํ„ด์ด ๋˜๋ ท์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.


์™œ ๋ฒˆ์—ญ์ด ์•„๋‹ˆ๋ผ ์ˆซ์ž ๋’ค์ง‘๊ธฐ์ธ๊ฐ€์š”?

๋…ผ๋ฌธ์€ ์˜์–ดโ†’๋…์ผ์–ด ๋ฒˆ์—ญ์œผ๋กœ ๊ฒ€์ฆํ–ˆ์ง€๋งŒ, ๊ทธ๊ฑด 8ร— P100 GPU๋กœ 12์‹œ๊ฐ„ ํ•™์Šต์ด ํ•„์š”ํ•ด์š”. ๋ฌด๋ฃŒ Space์—์„œ ๊ทธ๊ฒŒ ์•ˆ ๋˜๋‹ˆ๊นŒ, ๋ถ€ํŒ… ์‹œ 30์ดˆ ์•ˆ์— ํ•™์Šต ๋๋‚˜๋Š” toy task๋ฅผ ๊ณจ๋ž์Šต๋‹ˆ๋‹ค.

์ˆซ์ž ๋’ค์ง‘๊ธฐ์˜ ์žฅ์ :

  • ์–ดํœ˜๊ฐ€ ์ž‘์Œ (0~9 + ํŠน์ˆ˜ ํ† ํฐ = 13๊ฐœ)
  • ์ž…์ถœ๋ ฅ ๊ธธ์ด๊ฐ€ ๊ฐ™๊ณ  ์ •๋‹ต์ด ๋ช…ํ™•
  • ์žฅ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ์„ ๊ฐ•์ œ โ€” ์ถœ๋ ฅ 1๋ฒˆ์งธ๋Š” ์ž…๋ ฅ ๋งˆ์ง€๋ง‰์„ ๋ด์•ผ ํ•จ
  • ์‹œ๊ฐํ™”๊ฐ€ ๊ทน์  (๋ฐ˜๋Œ€๊ฐ์„  ํŒจํ„ด)

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

โ”œโ”€โ”€ app.py             # Gradio ๋ฐ๋ชจ (ํ•™์Šต + ์ถ”๋ก  + ์‹œ๊ฐํ™”)
โ”œโ”€โ”€ transformer.py     # ๋…ผ๋ฌธ์„ ๊ทธ๋Œ€๋กœ ์žฌํ˜„ํ•œ Transformer ๋ณธ์ฒด
โ”œโ”€โ”€ requirements.txt   # ํŒจํ‚ค์ง€ ๋ชฉ๋ก
โ””โ”€โ”€ README.md          # ์ด ํŒŒ์ผ

๋ชจ๋ธ ๊ตฌ์„ฑ

์ด ๋ฐ๋ชจ๋Š” ๋…ผ๋ฌธ base ๋ชจ๋ธ์˜ 1/8 ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ๊ตฌ์กฐ๋Š” ์™„์ „ํžˆ ๋™์ผํ•˜๊ณ  ํฌ๊ธฐ๋งŒ ์ค„์˜€์–ด์š”.

ํ•ญ๋ชฉ ๋…ผ๋ฌธ base ์ด ๋ฐ๋ชจ
d_model 512 64
์ธต ์ˆ˜ N 6 2
ํ—ค๋“œ ์ˆ˜ h 8 4
d_ff 2048 128
์–ดํœ˜ ํฌ๊ธฐ 37K (BPE) 13
ํŒŒ๋ผ๋ฏธํ„ฐ 65M ~80K

ํ•™์Šต ์„ค์ •

optimizer = Adam(lr=5e-4, betas=(0.9, 0.98), eps=1e-9)   # ๋…ผ๋ฌธ ยง5.3
loss     = CrossEntropy(ignore_index=PAD, label_smoothing=0.1)
steps    = 2000
batch    = 128
  • ๋งค step๋งˆ๋‹ค ๊ธธ์ด 3~10์˜ ๋ฌด์ž‘์œ„ ์ˆซ์ž์—ด์„ ์ƒˆ๋กœ ์ƒ์„ฑ (๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ)
  • Gradient clipping = 1.0
  • Greedy decoding์œผ๋กœ ์ถ”๋ก 

ํ•™์Šต์€ ๋ถ€ํŒ…ํ•  ๋•Œ ์ž๋™์œผ๋กœ ์ง„ํ–‰๋˜๋ฉฐ, ๋๋‚œ ๋ชจ๋ธ์€ model.pt๋กœ ์บ์‹ฑ๋ฉ๋‹ˆ๋‹ค.


๋…ผ๋ฌธ ํ•ต์‹ฌ ๋ถ€๋ถ„ ์ฝ”๋“œ ๋งคํ•‘

๋…ผ๋ฌธ ์œ„์น˜ ์ฝ”๋“œ ์œ„์น˜
์‹ (1) softmax(QKแต€/โˆšd_k)V transformer.py :: scaled_dot_product_attention
ยง3.2.2 Multi-Head MultiHeadAttention
ยง3.5 Positional Encoding PositionalEncoding
์‹ (2) FFN FeedForward
ยง3.1 ์ธ์ฝ”๋” 1์ธต EncoderLayer (Post-LN)
ยง3.1 ๋””์ฝ”๋” 1์ธต DecoderLayer (Post-LN)
ยง3.4 ์ž„๋ฒ ๋”ฉ ร— โˆšd_model Transformer.encode ๋‚ด๋ถ€

์–ด๋–ป๊ฒŒ ๋ด์•ผ ํ•˜๋‚˜์š”? (์‹œ๊ฐํ™” ํ•ด์„)

Cross-Attention ํžˆํŠธ๋งต:

  • ๊ฐ€๋กœ์ถ•: ์ธ์ฝ”๋” ์œ„์น˜ (์ž…๋ ฅ ํ† ํฐ๋“ค, ์™ผ์ชฝ์ด ์‹œํ€€์Šค ์•ž์ชฝ)
  • ์„ธ๋กœ์ถ•: ๋””์ฝ”๋” ์œ„์น˜ (์ถœ๋ ฅ ํ† ํฐ๋“ค, ์œ„์ชฝ์ด ๋จผ์ € ์ƒ์„ฑ)
  • ์ƒ‰์ด ๋ฐ์„์ˆ˜๋ก ๊ฐ•ํ•œ attention

๋’ค์ง‘๊ธฐ ํƒœ์Šคํฌ์—์„œ ์ž˜ ํ•™์Šต๋œ ๋ชจ๋ธ์€:

์ถœ๋ ฅ ์œ„์น˜ 0 (BOS ๋‹ค์Œ, ์ฒซ ์ถœ๋ ฅ ํ† ํฐ) โ†’ ์ž…๋ ฅ ๋งˆ์ง€๋ง‰ ํ† ํฐ์„ ๋ด„
์ถœ๋ ฅ ์œ„์น˜ 1                          โ†’ ์ž…๋ ฅ ๋์—์„œ ๋‘ ๋ฒˆ์งธ๋ฅผ ๋ด„
...

๋”ฐ๋ผ์„œ ์™ผ์ชฝ ์œ„ โ†’ ์˜ค๋ฅธ์ชฝ ์•„๋ž˜ ๋Œ€๊ฐ์„ ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ, ์ฆ‰ ์˜ค๋ฅธ์ชฝ ์œ„ โ†’ ์™ผ์ชฝ ์•„๋ž˜๋กœ ํ๋ฅด๋Š” anti-diagonal์ด ๋ณด์ด๋ฉด ์„ฑ๊ณต์ž…๋‹ˆ๋‹ค.


Hugging Face Spaces ๋ฐฐํฌ ์‹œ ์ฃผ์˜์‚ฌํ•ญ

ResNet ๋ฐ๋ชจ๋ฅผ ๋ฐฐํฌํ•  ๋•Œ ๊ฒช์—ˆ๋˜ ๋ฌธ์ œ๋“ค์ด ์—ฌ๊ธฐ์„œ๋„ ๋™์ผํ•˜๊ฒŒ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์–ด์š”:

1. YAML ํ”„๋ก ํŠธ๋งคํ„ฐ ํ•„์ˆ˜

์ด README.md ์ตœ์ƒ๋‹จ์˜ --- ... --- ๋ธ”๋ก์ด ์—†์œผ๋ฉด Space๊ฐ€ ๋นŒ๋“œ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

2. colorFrom/colorTo๋Š” ์ •ํ•ด์ง„ 8์ƒ‰๋งŒ

ํ—ˆ์šฉ๋˜๋Š” ์ƒ‰: red, yellow, green, blue, indigo, purple, pink, gray

3. Python 3.13 ํšŒํ”ผ

audioop ํ‘œ์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ 3.13์—์„œ ์ œ๊ฑฐ๋˜์–ด ์ผ๋ถ€ ํŒจํ‚ค์ง€ ๋นŒ๋“œ ์‹คํŒจ. 3.10 ๊ถŒ์žฅ.

4. PyTorch CPU ๋นŒ๋“œ

๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฌด๋ฃŒ Space๋Š” CPU๋งŒ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. torch ์„ค์น˜ ์‹œ CUDA ๋ฒ„์ „์ด ๋“ค์–ด๊ฐ€๋ฉด ๋””์Šคํฌ ์šฉ๋Ÿ‰์„ ์ดˆ๊ณผํ•  ์ˆ˜ ์žˆ์œผ๋‹ˆ ํ•„์š”์‹œ torch --index-url https://download.pytorch.org/whl/cpu๋กœ ๋ช…์‹œํ•˜์„ธ์š”.


๋กœ์ปฌ ์‹คํ–‰

# 1) ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt

# 2) ๋ฐ๋ชจ ์‹คํ–‰ (์ฒซ ์‹คํ–‰ ์‹œ ์ž๋™ ํ•™์Šต)
python app.py

๊ธฐ๋ณธ์ ์œผ๋กœ http://127.0.0.1:7860 ์—์„œ ์—ด๋ฆฝ๋‹ˆ๋‹ค.


ํ•™์Šต์ด ์ž˜ ์•ˆ ๋˜๋ฉด

์ฒดํฌ๋ฆฌ์ŠคํŠธ:

  • PyTorch ๋ฒ„์ „์ด 2.0 ์ด์ƒ์ธ๊ฐ€
  • ํ•™์Šต step์ด 2000๋ฒˆ ์ด์ƒ ๋„๋Š”๊ฐ€ (์ฝ˜์†”์— step 200, 400, ... ๋กœ๊ทธ ํ™•์ธ)
  • step 1000์ฏค ๋˜๋ฉด token_acc๊ฐ€ 0.95 ์ด์ƒ์ธ๊ฐ€
  • ์ถœ๋ ฅ์ด ํ•ญ์ƒ ๊ฐ™์€ ํ† ํฐ๋งŒ ๋ฐ˜๋ณตํ•œ๋‹ค๋ฉด โ†’ ํ•™์Šต์ด ๊ฑฐ์˜ ์•ˆ ๋œ ๊ฒƒ. step ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ lr ์กฐ์ •
  • cross-attention์ด ๊ท ์ผ(uniform)ํ•˜๋‹ค๋ฉด โ†’ ๋” ํ•™์Šต ํ•„์š”

์ฐธ๊ณ 

@inproceedings{vaswani2017attention,
  title     = {Attention Is All You Need},
  author    = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki
               and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N
               and Kaiser, {\L}ukasz and Polosukhin, Illia},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2017}
}