LSTM_GUE_test_Model / README.md
OneclickAI's picture
Upload README.md
707b49e verified
---
license: apache-2.0
---
์•ˆ๋…•ํ•˜์„ธ์š” Oneclick AI ์ž…๋‹ˆ๋‹ค!!
์˜ค๋Š˜์€, RNN์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•œ LSTM(Long Short-Term Memory)๊ณผ GRU(Gated Recurrent Unit) ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณผ๊นŒ ํ•ฉ๋‹ˆ๋‹ค.
RNN์ด ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ ํ˜์‹ ์„ ๊ฐ€์ ธ์™”์ง€๋งŒ, ๊ธด ์‹œํ€€์Šค์—์„œ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์ œ๋Œ€๋กœ ๊ธฐ์–ตํ•˜์ง€ ๋ชปํ•˜๋Š” '์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ'๋กœ ์ธํ•ด ํ•œ๊ณ„๋ฅผ ๋“œ๋Ÿฌ๋ƒˆ์Šต๋‹ˆ๋‹ค.
LSTM๊ณผ GRU๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๊ณ ๊ธ‰ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์œผ๋กœ, ๋งˆ์น˜ ์‚ฌ๋žŒ์˜ ์žฅ๊ธฐ ๊ธฐ์–ต์ฒ˜๋Ÿผ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์„ ํƒ์ ์œผ๋กœ ์œ ์ง€ํ•˜๊ณ  ์žŠ์–ด๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋Š” '๊ฒŒ์ดํŠธ' ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค.
์˜ค๋Š˜์€ ์ด ๋‘ ๋ชจ๋ธ์ด ์–ด๋–ป๊ฒŒ RNN์˜ ์•ฝ์ ์„ ๋ณด์™„ํ•˜๋ฉฐ ์ž‘๋™ํ•˜๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ป๊ฒŒ ๋” ๋ณต์žกํ•œ ๋ฌธ์žฅ์ด๋‚˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ •๊ตํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„๋ด…์‹œ๋‹ค.
---
## ๋ชฉ์ฐจ
1. LSTM/GRU ํ•ต์‹ฌ ์›๋ฆฌ ํŒŒ์•…ํ•˜๊ธฐ
- ์™œ LSTM/GRU๋ฅผ ์‚ฌ์šฉํ•ด์•ผ๋งŒ ํ• ๊นŒ?
- LSTM์˜ ์‹ฌ์žฅ : ์…€ ์ƒํƒœ์™€ 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ ๋ฉ”์ปค๋‹ˆ์ฆ˜
- GRU : LSTM์˜ ๊ฐ„์†Œํ™”๋œ ๋ฒ„์ „๊ณผ 2๊ฐœ์˜ ๊ฒŒ์ดํŠธ
- LSTM๊ณผ GRU๋ฅผ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ํŽผ์ณ๋ณด๊ธฐ
- LSTM/GRU์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ ์ƒ์„ธ ๋ถ„์„
2. ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•œ ๋‚ด๋ถ€ ์ฝ”๋“œ ๋“ค์—ฌ๋‹ค ๋ณด๊ธฐ
- Keras๋กœ ๊ตฌํ˜„ํ•œ LSTM/GRU ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜
- model.summary()๋กœ ๊ตฌ์กฐ ํ™•์ธํ•˜๊ธฐ
3. ์ง์ ‘ LSTM/GRU ๊ตฌํ˜„ํ•ด ๋ณด๊ธฐ
- 1๋‹จ๊ณ„ : ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋ฐ ์ „์ฒ˜๋ฆฌ
- 2๋‹จ๊ณ„ : ๋ชจ๋ธ ์ปดํŒŒ์ผ
- 3๋‹จ๊ณ„ : ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ‰๊ฐ€
- 4๋‹จ๊ณ„ : ํ•™์Šต๋œ ๋ชจ๋ธ ์ €์žฅ ๋ฐ ์žฌ์‚ฌ์šฉ
- 5๋‹จ๊ณ„ : ๋‚˜๋งŒ์˜ ๋ฌธ์žฅ์œผ๋กœ ๋ชจ๋ธ ํ…Œ์ŠคํŠธํ•˜๊ธฐ
4. ๋‚˜๋งŒ์˜ LSTM/GRU ๋ชจ๋ธ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ธฐ
- ๊ธฐ์ดˆ ์ฒด๋ ฅ ํ›ˆ๋ จ : ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
- ์ธต ์Œ“๊ธฐ : ๋‹ค์ค‘ LSTM/GRU ๋ ˆ์ด์–ด
- ๊ณผ๊ฑฐ์™€ ๋ฏธ๋ž˜๋ฅผ ๋™์‹œ์— : ์–‘๋ฐฉํ–ฅ LSTM/GRU
- ์ „์ดํ•™์Šต์œผ๋กœ ์„ฑ๋Šฅ ๊ทน๋Œ€ํ™” ํ•˜๊ธฐ
5. ๊ฒฐ๋ก 
---
## 1. LSTM/GRU ํ•ต์‹ฌ์›๋ฆฌ ํŒŒ์•…ํ•˜๊ธฐ
๊ฐ€์žฅ ๋จผ์ €, LSTM๊ณผ GRU๊ฐ€ ์™œ RNN์˜ ๋Œ€์•ˆ์œผ๋กœ ๋“ฑ์žฅํ–ˆ๋Š”์ง€ ๊ทธ ๊ทผ๋ณธ์ ์ธ ์ด์œ ๋ถ€ํ„ฐ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
**์™œ LSTM/GRU๋ฅผ ์‚ฌ์šฉํ• ๊นŒ?? with RNN์˜ ํ•œ๊ณ„**
๊ธฐ๋ณธ RNN์€ ์€๋‹‰ ์ƒํƒœ๋ฅผ ํ†ตํ•ด ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜์ง€๋งŒ, ์‹œํ€€์Šค๊ฐ€ ๊ธธ์–ด์ง€๋ฉด ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค(Vanishing Gradient)์ด๋‚˜ ํญ๋ฐœ(Exploding Gradient) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
์ด๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์— ๊ฐ€๊นŒ์›Œ์ง€๊ฑฐ๋‚˜ ๋ฌดํ•œ๋Œ€๊ฐ€ ๋˜์–ด, ๋ฌธ์žฅ ์•ž๋ถ€๋ถ„์˜ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” '์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ(Long-Term Dependency)'๋ฅผ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, "์–ด๋ฆฐ ์‹œ์ ˆ ํ”„๋ž‘์Šค์—์„œ ์ž๋ž๊ธฐ ๋•Œ๋ฌธ์—... (๊ธด ๋‚ด์šฉ)... ๊ทธ๋ž˜์„œ ๋‚˜๋Š” ํ”„๋ž‘์Šค์–ด๋ฅผ ์œ ์ฐฝํ•˜๊ฒŒ ๊ตฌ์‚ฌํ•œ๋‹ค."๋ผ๋Š” ๋ฌธ์žฅ์—์„œ RNN์€ 'ํ”„๋ž‘์Šค'๋ผ๋Š” ์ดˆ๊ธฐ ์ •๋ณด๋ฅผ ์žŠ๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค.
LSTM๊ณผ GRU๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด '๊ฒŒ์ดํŠธ'๋ผ๋Š” ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์—ฌ, ์ •๋ณด์˜ ํ๋ฆ„์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค.
์ด๋“ค์€ RNN์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์„ ํƒ์ ์œผ๋กœ ๊ธฐ์–ตํ•˜๊ณ  ๋ถˆํ•„์š”ํ•œ ๊ฒƒ์€ ์žŠ์–ด๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
**LSTM์˜ ์‹ฌ์žฅ : ์…€ ์ƒํƒœ์™€ 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ ๋ฉ”์ปค๋‹ˆ์ฆ˜**
LSTM์˜ ํ•ต์‹ฌ์€ '์…€ ์ƒํƒœ(Cell State, $C_t$)'์™€ ์ด๋ฅผ ์ œ์–ดํ•˜๋Š” 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ์ž…๋‹ˆ๋‹ค.
- ์…€ ์ƒํƒœ(Cell State, $C_t$): ์žฅ๊ธฐ ๊ธฐ์–ต์„ ์œ„ํ•œ '์ปจ๋ฒ ์ด์–ด ๋ฒจํŠธ'๋กœ, ์ •๋ณด๊ฐ€ ๊ฑฐ์˜ ๋ณ€ํ˜• ์—†์ด ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
- ๊ฒŒ์ดํŠธ(Gates): ์‹œ๊ทธ๋ชจ์ด๋“œ(Sigmoid) ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด 0~1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ •๋ณด์˜ ํ†ต๊ณผ ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
1. ๋ง๊ฐ ๊ฒŒ์ดํŠธ(Forget Gate, $f_t$): ์ด์ „ ์…€ ์ƒํƒœ $C_{t-1}$์—์„œ ์–ด๋–ค ์ •๋ณด๋ฅผ ์žŠ์„์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
$f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$
(์—ฌ๊ธฐ์„œ $\sigma$๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜, $h_{t-1}$์€ ์ด์ „ ์€๋‹‰ ์ƒํƒœ, $x_t$๋Š” ํ˜„์žฌ ์ž…๋ ฅ)
2. ์ž…๋ ฅ ๊ฒŒ์ดํŠธ(Input Gate, $i_t$)์™€ ํ›„๋ณด ์…€ ์ƒํƒœ($\tilde{C_t}$): ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์–ผ๋งˆ๋‚˜ ์ถ”๊ฐ€ํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$
$\tilde{C_t} = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$
3. ์ถœ๋ ฅ ๊ฒŒ์ดํŠธ(Output Gate, $o_t$): ์…€ ์ƒํƒœ์—์„œ ์–ด๋–ค ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
$o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$
์ตœ์ข… ์…€ ์ƒํƒœ $C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C_t}$ ( $\odot$์€ ์š”์†Œ๋ณ„ ๊ณฑ)
์€๋‹‰ ์ƒํƒœ $h_t = o_t \odot \tanh(C_t)$
์ด ๊ตฌ์กฐ ๋•๋ถ„์— LSTM์€ ์žฅ๊ธฐ์ ์ธ ์˜์กด์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
**GRU : LSTM์˜ ๊ฐ„์†Œํ™”๋œ ๋ฒ„์ „๊ณผ 2๊ฐœ์˜ ๊ฒŒ์ดํŠธ**
GRU๋Š” LSTM์˜ ๋ณ€ํ˜•์œผ๋กœ, ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ค„์—ฌ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.
์€๋‹‰ ์ƒํƒœ $h_t$๊ฐ€ ์…€ ์ƒํƒœ ์—ญํ• ์„ ๊ฒธํ•˜๋ฉฐ, 2๊ฐœ์˜ ๊ฒŒ์ดํŠธ๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
- ๋ฆฌ์…‹ ๊ฒŒ์ดํŠธ(Reset Gate, $r_t$): ์ด์ „ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์–ผ๋งˆ๋‚˜ ๋ฌด์‹œํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
$r_t = \sigma(W_r \cdot [h_{t-1}, x_t] + b_r)$
- ์—…๋ฐ์ดํŠธ ๊ฒŒ์ดํŠธ(Update Gate, $z_t$): ์ด์ „ ์ƒํƒœ์™€ ์ƒˆ ํ›„๋ณด ์ƒํƒœ๋ฅผ ์–ผ๋งˆ๋‚˜ ์„ž์„์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. (LSTM์˜ ๋ง๊ฐ+์ž…๋ ฅ ๊ฒŒ์ดํŠธ ์—ญํ• )
$z_t = \sigma(W_z \cdot [h_{t-1}, x_t] + b_z)$
ํ›„๋ณด ์€๋‹‰ ์ƒํƒœ $\tilde{h_t} = \tanh(W_h \cdot [r_t \odot h_{t-1}, x_t] + b_h)$
์ตœ์ข… $h_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tilde{h_t}$
GRU๋Š” LSTM๋งŒํผ ๊ฐ•๋ ฅํ•˜๋ฉด์„œ๋„ ํ•™์Šต์ด ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค.
**LSTM/GRU๋ฅผ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ํŽผ์ณ๋ณด๊ธฐ**
์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋„คํŠธ์›Œํฌ๋ฅผ ๊ธธ๊ฒŒ ํŽผ์ณ์„œ ํ‘œํ˜„ํ•˜๋ฉด, ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```markdown
์‹œ๊ฐ„ ํ๋ฆ„ โ”€โ”€โ”€โ–ถ
์ž…๋ ฅ ์‹œํ€€์Šค: xโ‚ xโ‚‚ xโ‚ƒ ... xโ‚œ
โ†“ โ†“ โ†“ โ†“
โ”Œโ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ” ... โ”Œโ”€โ”€โ”€โ”€โ”
hโ‚€, Cโ‚€ โ”€โ”€โ–ถโ”‚LSTMโ”‚โ–ถโ”‚LSTMโ”‚โ–ถโ”‚LSTMโ”‚ โ–ถ ... โ–ถโ”‚LSTMโ”‚ (๋˜๋Š” GRU)
โ””โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ”‚ โ”‚ โ”‚
โ–ผ โ–ผ โ–ผ โ–ผ
hโ‚ hโ‚‚ hโ‚ƒ hโ‚œ
```
๊ฐ ํƒ€์ž„์Šคํ…์—์„œ ๊ฒŒ์ดํŠธ๊ฐ€ ์ •๋ณด๋ฅผ ์ œ์–ดํ•˜๋ฉฐ, ์…€ ์ƒํƒœ(๋˜๋Š” ์€๋‹‰ ์ƒํƒœ)๊ฐ€ ์žฅ๊ธฐ์ ์œผ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
**LSTM/GRU์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ**
- ๊ฒŒ์ดํŠธ ๋ฉ”์ปค๋‹ˆ์ฆ˜: ์ •๋ณด ์„ ํƒ๊ณผ ์‚ญ์ œ.
- ์€๋‹‰/์…€ ์ƒํƒœ: ๋ฉ”๋ชจ๋ฆฌ ์—ญํ• .
- ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ : ๋ชจ๋“  ํƒ€์ž„์Šคํ…์—์„œ ๋™์ผํ•œ ๊ฐ€์ค‘์น˜ ์‚ฌ์šฉ.
---
## 2. ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•œ ๋‚ด๋ถ€ ์ฝ”๋“œ ๋“ค์—ฌ๋‹ค ๋ณด๊ธฐ
์ด์ œ ์ด๋ก ์„ ๋ฐ”ํƒ•์œผ๋กœ, TensorFlow Keras ๋ฅผ ํ†ตํ•ด ์ง์ ‘ LSTM๊ณผ GRU๋ฅผ ๊ตฌํ˜„ํ•ด ๋ด…์‹œ๋‹ค.
Keras๋กœ ๊ตฌํ˜„ํ•œ LSTM/GRU ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ์‹ฌ์ธต ๋ถ„์„๋‹ค์Œ์€ IMDB ์˜ํ™” ๋ฆฌ๋ทฐ ๊ฐ์„ฑ ๋ถ„์„์„ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ LSTM ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. (GRU๋„ ์œ ์‚ฌ)
```python
import tensorflow as tf
from tensorflow import keras
# ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ์ •์˜
model = keras.Sequential([
# 1. ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ์ธต
keras.layers.Embedding(input_dim=10000, output_dim=32),
# 2. LSTM ์ธต (GRU๋กœ ๋ฐ”๊พธ๋ ค๋ฉด SimpleRNN ๋Œ€์‹  LSTM ๋˜๋Š” GRU ์‚ฌ์šฉ)
keras.layers.LSTM(32),
# 3. ์ตœ์ข… ๋ถ„๋ฅ˜๊ธฐ
keras.layers.Dense(1, activation="sigmoid"),
])
# ๋ชจ๋ธ ๊ตฌ์กฐ ์š”์•ฝ ์ถœ๋ ฅ
model.summary()
```
๋ ˆ์ด์–ด๋ฅผ ์ž์„ธํžˆ ๋“ค์–ด๋‹ค ๋ด…์‹œ๋‹ค.
- **์ž„๋ฒ ๋”ฉ ์ธต(Embedding)**
```python
keras.layers.Embedding(input_dim=10000, output_dim=32)
```
๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜, RNN ๋ฌธ์„œ์™€ ๋™์ผ.
- **์ˆœํ™˜ ๊ณ„์ธต(LSTM ๋˜๋Š” GRU)**
```python
keras.layers.LSTM(32),
```
๋˜๋Š”
```python
keras.layers.GRU(32),
```
๋‚ด๋ถ€์ ์œผ๋กœ ๊ฒŒ์ดํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ํ•™์Šต. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ๋งŒ ์ถœ๋ ฅ.
- **์™„์ „ ์—ฐ๊ฒฐ ๊ณ„์ธต(Dense)**
```python
keras.layers.Dense(1, activation="sigmoid")
```
์ตœ์ข… ํŒ๋‹จ.
model.summary()๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ๊ณ„์‚ฐ ์›๋ฆฌ ์ดํ•ดํ•˜๊ธฐ์œ„ ์ฝ”๋“œ์—์„œ model.summary()๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค.
```bash
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 32) 320000
lstm (LSTM) (None, 32) 8320
dense (Dense) (None, 1) 33
=================================================================
Total params: 328,353
Trainable params: 328,353
Non-trainable params: 0
_________________________________________________________________
```
๊ฐ ์ธต์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋Š” ์–ด๋–ป๊ฒŒ ๊ณ„์‚ฐ๋˜๋Š”์ง€ ์•Œ์•„๋ณด์ž๋ฉด,
1. Embedding: 10,000 * 32 = 320,000 ๊ฐœ.
2. LSTM: ์ž…๋ ฅ(32)๊ณผ ์€๋‹‰(32)์„ ๊ณ ๋ คํ•œ 4๊ฐœ์˜ ๊ฒŒ์ดํŠธ(์ž…๋ ฅ, ๋ง๊ฐ, ์ถœ๋ ฅ, ํ›„๋ณด)๋กœ, (32+32+1)*32*4 = 8,320 ๊ฐœ. (GRU๋Š” 3๋ฐฐ: ์•ฝ 6,240)
3. Dense: 32 * 1 + 1 = 33 ๊ฐœ.
---
## 3. ์ง์ ‘ LSTM/GRU ๊ตฌํ˜„ํ•ด ๋ณด๊ธฐ
์ด์ œ, ์ „์ฒด ์ฝ”๋“œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์‹คํ–‰ํ•˜๋ฉฐ ์ง์ ‘ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. (RNN ๋ฌธ์„œ์™€ ์œ ์‚ฌ, IMDB ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ)
**1๋‹จ๊ณ„. ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋ฐ ์ „์ฒ˜๋ฆฌ**
```python
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=10000)
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=256)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=256)
```
**2๋‹จ๊ณ„. ๋ชจ๋ธ ์ปดํŒŒ์ผ**
```python
model = keras.Sequential([
layers.Embedding(input_dim=10000, output_dim=32),
layers.LSTM(32), # ๋˜๋Š” layers.GRU(32)
layers.Dense(1, activation="sigmoid")
])
model.compile(
loss="binary_crossentropy",
optimizer="adam",
metrics=["accuracy"]
)
```
**3๋‹จ๊ณ„. ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ‰๊ฐ€**
```python
batch_size = 128
epochs = 10
history = model.fit(
x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test)
)
score = model.evaluate(x_test, y_test, verbose=0)
print(f"\nTest loss: {score[0]:.4f}")
print(f"Test accuracy: {score[1]:.4f}")
```
**4๋‹จ๊ณ„. ํ•™์Šต๋œ ๋ชจ๋ธ ์ €์žฅ ๋ฐ ์žฌ์‚ฌ์šฉ**
```python
model.save("my_lstm_model_imdb.keras")
loaded_model = keras.models.load_model("my_lstm_model_imdb.keras")
```
**5๋‹จ๊ณ„. ๋‚˜๋งŒ์˜ ๋ฌธ์žฅ์œผ๋กœ ๋ชจ๋ธ ํ…Œ์ŠคํŠธํ•˜๊ธฐ**
```python
word_index = keras.datasets.imdb.get_word_index()
review = "This movie was fantastic and wonderful"
tokens = [word_index.get(word, 2) for word in review.lower().split()]
padded_tokens = keras.preprocessing.sequence.pad_sequences([tokens], maxlen=256)
prediction = loaded_model.predict(padded_tokens)
print(f"๋ฆฌ๋ทฐ: '{review}'")
print(f"๊ธ์ • ํ™•๋ฅ : {prediction[0][0] * 100:.2f}%")
```
## 4. ๋‚˜๋งŒ์˜ LSTM/GRU ๋ชจ๋ธ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ธฐ
๊ธฐ๋ณธ ๋ชจ๋ธ์„ ๋” ๊ฐ•๋ ฅํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
- **๊ธฐ์ดˆ ์ฒด๋ ฅ ํ›ˆ๋ จ : ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹**
ํ•™์Šต๋ฅ , ๋ฐฐ์น˜ ํฌ๊ธฐ, ์œ ๋‹› ์ˆ˜ ๋“ฑ์„ ์กฐ์ •.
```python
optimizer = keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])
```
- **์ธต ์Œ“๊ธฐ : ๋‹ค์ค‘ LSTM/GRU ๋ ˆ์ด์–ด**
```python
model = keras.Sequential([
layers.Embedding(input_dim=10000, output_dim=64),
layers.LSTM(64, return_sequences=True),
layers.LSTM(32),
layers.Dense(1, activation='sigmoid')
])
```
- **๊ณผ๊ฑฐ์™€ ๋ฏธ๋ž˜๋ฅผ ๋™์‹œ์— : ์–‘๋ฐฉํ–ฅ LSTM/GRU**
```python
model = keras.Sequential([
layers.Embedding(input_dim=10000, output_dim=64),
layers.Bidirectional(layers.LSTM(64)),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid')
])
```
- **์ „์ดํ•™์Šต์œผ๋กœ ์„ฑ๋Šฅ ๊ทน๋Œ€ํ™” ํ•˜๊ธฐ**
์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ(์˜ˆ: GloVe ์ž„๋ฒ ๋”ฉ) ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ๋Œ€ํ˜• ๋ชจ๋ธ์˜ LSTM ๋ ˆ์ด์–ด freeze.
```python
# ์˜ˆ: ์‚ฌ์ „ ํ•™์Šต๋œ ์ž„๋ฒ ๋”ฉ ๋กœ๋“œ (๋ณ„๋„ ํŒŒ์ผ ํ•„์š”)
embedding_layer = layers.Embedding(input_dim=10000, output_dim=100, trainable=False)
# GloVe ๋“ฑ์œผ๋กœ ์ดˆ๊ธฐํ™”
```
## 5. ๊ฒฐ๋ก 
์˜ค๋Š˜์€, RNN์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด์„  LSTM๊ณผ GRU์˜ ํ•ต์‹ฌ ์›๋ฆฌ๋ถ€ํ„ฐ ์‹ค์ œ ๊ตฌํ˜„, ์—…๊ทธ๋ ˆ์ด๋“œ ๋ฐฉ๋ฒ•๊นŒ์ง€ ์•Œ์•„๋ณด์•˜์Šต๋‹ˆ๋‹ค.
์ด ๋‘ ๋ชจ๋ธ์€ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹œ๊ณ„์—ด ์˜ˆ์ธก, ์Œ์„ฑ ์ธ์‹ ๋“ฑ์—์„œ ์—ฌ์ „ํžˆ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
ํŠนํžˆ, LSTM/GRU์˜ ๊ฒŒ์ดํŠธ ์•„์ด๋””์–ด๋Š” ์ดํ›„ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์˜ ๊ธฐ๋ฐ˜์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ์—๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ๋กœ ๋Œ์•„์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค!!
์˜ค๋Š˜๋„ ์ข‹์€ํ•˜๋ฃจ ๋ณด๋‚ด์„ธ์š”!!