์ง์ ๊ตฌํํ Transformer ๋ฐ RoPE๋ก ์์ด->ํ๊ตญ์ด ๋ฒ์ญ ๋ชจ๋ธ ์ ์
- ์ฝ 13๋ง์์ ์์ด-ํ๊ตญ์ด ๋ฐ์ดํฐ๋ก scratch training.
num_epochs = 5
batch_size = 64
config.intermediate_size = 768*4
config.num_attention_heads = 6
config.num_hidden_layers = 8
