Transformer / README.md
JangTaeng's picture
Upload 4 files
0465ac4 verified
---
title: Transformer Demo
emoji: ๐Ÿค–
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
---
# Transformer โ€” ๋…ผ๋ฌธ ์žฌํ˜„ ๋ฐ๋ชจ
**๋…ผ๋ฌธ**: [Attention Is All You Need](https://arxiv.org/abs/1706.03762) (Vaswani et al., NIPS 2017)
> RNN๊ณผ CNN์„ ๋ชจ๋‘ ๋ฒ„๋ฆฌ๊ณ  **์˜ค์ง attention๋งŒ์œผ๋กœ** ์ธ์ฝ”๋”-๋””์ฝ”๋”๋ฅผ ๊ตฌ์„ฑํ•œ
> Transformer ๋…ผ๋ฌธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ์žฌํ˜„ํ•˜๊ณ , ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ง์ ‘ ์ฒดํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” Space์ž…๋‹ˆ๋‹ค.
---
## ๋ฌด์—‡์„ ํ•  ์ˆ˜ ์žˆ๋‚˜์š”?
์ˆซ์ž ์‹œํ€€์Šค๋ฅผ ์ž…๋ ฅํ•˜๋ฉด **Transformer๊ฐ€ ๋’ค์ง‘์–ด** ์ค๋‹ˆ๋‹ค.
```
์ž…๋ ฅ : 1 2 3 4 5
์ถœ๋ ฅ : 5 4 3 2 1
```
๊ทธ๋ฆฌ๊ณ  ๋” ํฅ๋ฏธ๋กœ์šด ๊ฑด โ€” ๋””์ฝ”๋”์˜ **cross-attention ๊ฐ€์ค‘์น˜**๋ฅผ ์‹œ๊ฐํ™”ํ•ด์„œ
๋ชจ๋ธ์ด "์ถœ๋ ฅ i๋ฒˆ์งธ ์œ„์น˜๋ฅผ ๋งŒ๋“ค ๋•Œ ์ž…๋ ฅ ์–ด๋””๋ฅผ ๋ดค๋Š”์ง€"๋ฅผ ์ง์ ‘ ๋ณผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฑฐ์˜ˆ์š”.
๋’ค์ง‘๊ธฐ ํƒœ์Šคํฌ์—์„œ๋Š” **๋ฐ˜๋Œ€๊ฐ์„ (anti-diagonal) ํŒจํ„ด**์ด ๋˜๋ ท์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.
---
## ์™œ ๋ฒˆ์—ญ์ด ์•„๋‹ˆ๋ผ ์ˆซ์ž ๋’ค์ง‘๊ธฐ์ธ๊ฐ€์š”?
๋…ผ๋ฌธ์€ ์˜์–ดโ†’๋…์ผ์–ด ๋ฒˆ์—ญ์œผ๋กœ ๊ฒ€์ฆํ–ˆ์ง€๋งŒ, ๊ทธ๊ฑด 8ร— P100 GPU๋กœ 12์‹œ๊ฐ„ ํ•™์Šต์ด ํ•„์š”ํ•ด์š”.
๋ฌด๋ฃŒ Space์—์„œ ๊ทธ๊ฒŒ ์•ˆ ๋˜๋‹ˆ๊นŒ, **๋ถ€ํŒ… ์‹œ 30์ดˆ ์•ˆ์— ํ•™์Šต ๋๋‚˜๋Š” toy task**๋ฅผ ๊ณจ๋ž์Šต๋‹ˆ๋‹ค.
์ˆซ์ž ๋’ค์ง‘๊ธฐ์˜ ์žฅ์ :
- ์–ดํœ˜๊ฐ€ ์ž‘์Œ (0~9 + ํŠน์ˆ˜ ํ† ํฐ = 13๊ฐœ)
- ์ž…์ถœ๋ ฅ ๊ธธ์ด๊ฐ€ ๊ฐ™๊ณ  ์ •๋‹ต์ด ๋ช…ํ™•
- **์žฅ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ**์„ ๊ฐ•์ œ โ€” ์ถœ๋ ฅ 1๋ฒˆ์งธ๋Š” ์ž…๋ ฅ ๋งˆ์ง€๋ง‰์„ ๋ด์•ผ ํ•จ
- ์‹œ๊ฐํ™”๊ฐ€ ๊ทน์  (๋ฐ˜๋Œ€๊ฐ์„  ํŒจํ„ด)
---
## ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ
```
โ”œโ”€โ”€ app.py # Gradio ๋ฐ๋ชจ (ํ•™์Šต + ์ถ”๋ก  + ์‹œ๊ฐํ™”)
โ”œโ”€โ”€ transformer.py # ๋…ผ๋ฌธ์„ ๊ทธ๋Œ€๋กœ ์žฌํ˜„ํ•œ Transformer ๋ณธ์ฒด
โ”œโ”€โ”€ requirements.txt # ํŒจํ‚ค์ง€ ๋ชฉ๋ก
โ””โ”€โ”€ README.md # ์ด ํŒŒ์ผ
```
---
## ๋ชจ๋ธ ๊ตฌ์„ฑ
์ด ๋ฐ๋ชจ๋Š” ๋…ผ๋ฌธ base ๋ชจ๋ธ์˜ **1/8 ํฌ๊ธฐ**์ž…๋‹ˆ๋‹ค. ๊ตฌ์กฐ๋Š” ์™„์ „ํžˆ ๋™์ผํ•˜๊ณ  ํฌ๊ธฐ๋งŒ ์ค„์˜€์–ด์š”.
| ํ•ญ๋ชฉ | ๋…ผ๋ฌธ base | ์ด ๋ฐ๋ชจ |
|------|-----------|---------|
| d_model | 512 | **64** |
| ์ธต ์ˆ˜ N | 6 | **2** |
| ํ—ค๋“œ ์ˆ˜ h | 8 | **4** |
| d_ff | 2048 | **128** |
| ์–ดํœ˜ ํฌ๊ธฐ | 37K (BPE) | **13** |
| ํŒŒ๋ผ๋ฏธํ„ฐ | 65M | **~80K** |
---
## ํ•™์Šต ์„ค์ •
```python
optimizer = Adam(lr=5e-4, betas=(0.9, 0.98), eps=1e-9) # ๋…ผ๋ฌธ ยง5.3
loss = CrossEntropy(ignore_index=PAD, label_smoothing=0.1)
steps = 2000
batch = 128
```
- ๋งค step๋งˆ๋‹ค ๊ธธ์ด 3~10์˜ ๋ฌด์ž‘์œ„ ์ˆซ์ž์—ด์„ ์ƒˆ๋กœ ์ƒ์„ฑ (๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ)
- Gradient clipping = 1.0
- Greedy decoding์œผ๋กœ ์ถ”๋ก 
ํ•™์Šต์€ ๋ถ€ํŒ…ํ•  ๋•Œ ์ž๋™์œผ๋กœ ์ง„ํ–‰๋˜๋ฉฐ, ๋๋‚œ ๋ชจ๋ธ์€ `model.pt`๋กœ ์บ์‹ฑ๋ฉ๋‹ˆ๋‹ค.
---
## ๋…ผ๋ฌธ ํ•ต์‹ฌ ๋ถ€๋ถ„ ์ฝ”๋“œ ๋งคํ•‘
| ๋…ผ๋ฌธ ์œ„์น˜ | ์ฝ”๋“œ ์œ„์น˜ |
|-----------|-----------|
| ์‹ (1) `softmax(QKแต€/โˆšd_k)V` | `transformer.py :: scaled_dot_product_attention` |
| ยง3.2.2 Multi-Head | `MultiHeadAttention` |
| ยง3.5 Positional Encoding | `PositionalEncoding` |
| ์‹ (2) FFN | `FeedForward` |
| ยง3.1 ์ธ์ฝ”๋” 1์ธต | `EncoderLayer` (Post-LN) |
| ยง3.1 ๋””์ฝ”๋” 1์ธต | `DecoderLayer` (Post-LN) |
| ยง3.4 ์ž„๋ฒ ๋”ฉ ร— โˆšd_model | `Transformer.encode` ๋‚ด๋ถ€ |
---
## ์–ด๋–ป๊ฒŒ ๋ด์•ผ ํ•˜๋‚˜์š”? (์‹œ๊ฐํ™” ํ•ด์„)
**Cross-Attention ํžˆํŠธ๋งต**:
- ๊ฐ€๋กœ์ถ•: ์ธ์ฝ”๋” ์œ„์น˜ (์ž…๋ ฅ ํ† ํฐ๋“ค, ์™ผ์ชฝ์ด ์‹œํ€€์Šค ์•ž์ชฝ)
- ์„ธ๋กœ์ถ•: ๋””์ฝ”๋” ์œ„์น˜ (์ถœ๋ ฅ ํ† ํฐ๋“ค, ์œ„์ชฝ์ด ๋จผ์ € ์ƒ์„ฑ)
- ์ƒ‰์ด ๋ฐ์„์ˆ˜๋ก ๊ฐ•ํ•œ attention
๋’ค์ง‘๊ธฐ ํƒœ์Šคํฌ์—์„œ ์ž˜ ํ•™์Šต๋œ ๋ชจ๋ธ์€:
```
์ถœ๋ ฅ ์œ„์น˜ 0 (BOS ๋‹ค์Œ, ์ฒซ ์ถœ๋ ฅ ํ† ํฐ) โ†’ ์ž…๋ ฅ ๋งˆ์ง€๋ง‰ ํ† ํฐ์„ ๋ด„
์ถœ๋ ฅ ์œ„์น˜ 1 โ†’ ์ž…๋ ฅ ๋์—์„œ ๋‘ ๋ฒˆ์งธ๋ฅผ ๋ด„
...
```
๋”ฐ๋ผ์„œ **์™ผ์ชฝ ์œ„ โ†’ ์˜ค๋ฅธ์ชฝ ์•„๋ž˜ ๋Œ€๊ฐ์„ **์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ, ์ฆ‰
**์˜ค๋ฅธ์ชฝ ์œ„ โ†’ ์™ผ์ชฝ ์•„๋ž˜๋กœ ํ๋ฅด๋Š” anti-diagonal**์ด ๋ณด์ด๋ฉด ์„ฑ๊ณต์ž…๋‹ˆ๋‹ค.
---
## Hugging Face Spaces ๋ฐฐํฌ ์‹œ ์ฃผ์˜์‚ฌํ•ญ
ResNet ๋ฐ๋ชจ๋ฅผ ๋ฐฐํฌํ•  ๋•Œ ๊ฒช์—ˆ๋˜ ๋ฌธ์ œ๋“ค์ด ์—ฌ๊ธฐ์„œ๋„ ๋™์ผํ•˜๊ฒŒ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์–ด์š”:
### 1. YAML ํ”„๋ก ํŠธ๋งคํ„ฐ ํ•„์ˆ˜
์ด README.md ์ตœ์ƒ๋‹จ์˜ `--- ... ---` ๋ธ”๋ก์ด ์—†์œผ๋ฉด Space๊ฐ€ ๋นŒ๋“œ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
### 2. `colorFrom`/`colorTo`๋Š” ์ •ํ•ด์ง„ 8์ƒ‰๋งŒ
ํ—ˆ์šฉ๋˜๋Š” ์ƒ‰: `red, yellow, green, blue, indigo, purple, pink, gray`
### 3. Python 3.13 ํšŒํ”ผ
`audioop` ํ‘œ์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ 3.13์—์„œ ์ œ๊ฑฐ๋˜์–ด ์ผ๋ถ€ ํŒจํ‚ค์ง€ ๋นŒ๋“œ ์‹คํŒจ. **3.10** ๊ถŒ์žฅ.
### 4. PyTorch CPU ๋นŒ๋“œ
๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฌด๋ฃŒ Space๋Š” CPU๋งŒ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. `torch` ์„ค์น˜ ์‹œ CUDA ๋ฒ„์ „์ด ๋“ค์–ด๊ฐ€๋ฉด
๋””์Šคํฌ ์šฉ๋Ÿ‰์„ ์ดˆ๊ณผํ•  ์ˆ˜ ์žˆ์œผ๋‹ˆ ํ•„์š”์‹œ `torch --index-url https://download.pytorch.org/whl/cpu`๋กœ
๋ช…์‹œํ•˜์„ธ์š”.
---
## ๋กœ์ปฌ ์‹คํ–‰
```bash
# 1) ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt
# 2) ๋ฐ๋ชจ ์‹คํ–‰ (์ฒซ ์‹คํ–‰ ์‹œ ์ž๋™ ํ•™์Šต)
python app.py
```
๊ธฐ๋ณธ์ ์œผ๋กœ `http://127.0.0.1:7860` ์—์„œ ์—ด๋ฆฝ๋‹ˆ๋‹ค.
---
## ํ•™์Šต์ด ์ž˜ ์•ˆ ๋˜๋ฉด
์ฒดํฌ๋ฆฌ์ŠคํŠธ:
- [ ] PyTorch ๋ฒ„์ „์ด 2.0 ์ด์ƒ์ธ๊ฐ€
- [ ] ํ•™์Šต step์ด 2000๋ฒˆ ์ด์ƒ ๋„๋Š”๊ฐ€ (์ฝ˜์†”์— step 200, 400, ... ๋กœ๊ทธ ํ™•์ธ)
- [ ] step 1000์ฏค ๋˜๋ฉด `token_acc`๊ฐ€ 0.95 ์ด์ƒ์ธ๊ฐ€
- [ ] ์ถœ๋ ฅ์ด ํ•ญ์ƒ ๊ฐ™์€ ํ† ํฐ๋งŒ ๋ฐ˜๋ณตํ•œ๋‹ค๋ฉด โ†’ ํ•™์Šต์ด ๊ฑฐ์˜ ์•ˆ ๋œ ๊ฒƒ. step ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ lr ์กฐ์ •
- [ ] cross-attention์ด ๊ท ์ผ(uniform)ํ•˜๋‹ค๋ฉด โ†’ ๋” ํ•™์Šต ํ•„์š”
---
## ์ฐธ๊ณ 
```bibtex
@inproceedings{vaswani2017attention,
title = {Attention Is All You Need},
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki
and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N
and Kaiser, {\L}ukasz and Polosukhin, Illia},
booktitle = {Advances in Neural Information Processing Systems},
year = {2017}
}
```
- ๐Ÿ“„ ๋…ผ๋ฌธ: [arXiv:1706.03762](https://arxiv.org/abs/1706.03762)
- ๐Ÿ“ The Annotated Transformer: <http://nlp.seas.harvard.edu/annotated-transformer/>
- ๐ŸŽฅ The Illustrated Transformer (Jay Alammar): <https://jalammar.github.io/illustrated-transformer/>