You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

devngho/llama-ablation-large-korean-corpus

Llama ์•„ํ‚คํ…์ณ๋กœ pretrain๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์•ฝ 20.7B ํ† ํฐ์œผ๋กœ ์•ฝ 2.8์—ํฌํฌ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค. MaxText๋ฅผ ํ†ตํ•ด ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

500step๋งˆ๋‹ค ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” Google์˜ TPU Research Cloud (TRC)์˜ Cloud TPU ์ œ๊ณต์œผ๋กœ ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. โšก

์˜ˆ์‹œ

๊ตต์€ ๋ถ€๋ถ„์ด ์ž…๋ ฅ์ž…๋‹ˆ๋‹ค.

  • max_new_tokens: 500

์˜ˆ์‹œ 1 <s> ์ธ๊ณต์ง€๋Šฅ์€ '์ธ๊ฐ„์€ ์ž์‹ ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ๋ฐœํœ˜ํ•œ๋‹ค'๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. '์ธ๊ฐ„์€ ์ž์‹ ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ๋ฐœํœ˜ํ•œ๋‹ค'๋Š” ๊ฒƒ์€ '์ธ๊ฐ„์€ ์ž์‹ ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ๋ฐœํœ˜ํ•œ๋‹ค'๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. '์ธ๊ฐ„์€ ์ž์‹ ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ๋ฐœํœ˜ํ•œ๋‹ค'๋Š” ๊ฒƒ์€ '์ธ๊ฐ„์€ ์ž์‹ ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ๋ฐœํœ˜ํ•œ๋‹ค'๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. '์ธ๊ฐ„์€ ์ž์‹ ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ๋ฐœํœ˜ํ•œ๋‹ค'๋Š” ๊ฒƒ์€ '์ธ๊ฐ„์€ ์ž์‹ ์˜ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ๋ฐœํœ˜ํ•œ๋‹ค'๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค</s>

์˜ˆ์‹œ 2 <s> ํ•œ๊ธ€์˜ ํŠน์ง•์€ 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„ 'ํ•œ๊ธ€'๋กœ, 'ํ•œ๊ธ€'์„

์˜ˆ์‹œ 3 <s> ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง์ฒ˜๋Ÿผ '์ปคํ”ผ'๋ผ๋Š” ๋ง์ฒ˜๋Ÿผ '์ปคํ”ผ'๋ผ๋Š” ๋ง์€ '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋ง์„ ๋ถ™์—ฌ๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง์ฒ˜๋Ÿผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค. ์ปคํ”ผ๋Š” '์ปคํ”ผ'๋ผ๋Š” ๋ง๊ณผ ํ•จ๊ป˜ '์ปคํ”ผ'๋ผ๋Š” ๋‹จ์–ด๋ฅผ '์ปคํ”ผ'๋ผ๋Š” ๋ง๋กœ ๋ฐ”๊พธ์–ด๋†“์•˜๋‹ค

์ƒ๋‹นํ•œ ํ™˜๊ฐ๊ณผ ์–ด์ƒ‰ํ•จ, ๋ฐ˜๋ณต์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ƒ์„ธ

  • ์ œ์ž‘: devngho
  • ์–ธ์–ด: ko
  • ๋ผ์ด์„ ์Šค: mit

ํ•™์Šต ์ƒ์„ธ

  • learning_rate: 6e-4 (cosine, initial/end 6e-5)
  • warmup_ratio: 0.05
  • batch_size: 1024(fsdp 16 * per device 8 * ga 8)
  • optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
  • duration: about 29h 17m
  • steps: 10000
  • wandb์—์„œ ์ „์ฒด ์„ค์ •๊ณผ ๊ฒฐ๊ณผ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ์žฅ๋น„

TPU v4-32

ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹

AI Hub, ๋ชจ๋‘์˜๋ง๋ญ‰์น˜๋ฅผ dedup, length filteringํ–ˆ์Šต๋‹ˆ๋‹ค (์•ฝ 16,056,320ํ–‰).

AI Hub, ๋ชจ๋‘์˜๋ง๋ญ‰์น˜ ๊ทœ์ •์œผ๋กœ ์ธํ•ด ๋ฐ์ดํ„ฐ์…‹์„ ๊ณต๊ฐœํ•  ์ˆ˜ ์—†์ง€๋งŒ, ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•œ๋‹ค๋ฉด devngho/dataset-preprocess์˜ ๊ณผ์ •์œผ๋กœ ๋™์ผํ•˜๊ฒŒ ์ „์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์†Œํ”„ํŠธ์›จ์–ด

jax==0.4.35

MaxText๋ฅผ ํฌํฌํ•œ devngho/MaxText

ํ•™์Šต ๊ฒฐ๊ณผ

  • learning/loss: 2.6237056255340576
  • eval/avg_loss: 2.6179106279033793

์•„๋ž˜์— ๋ฒค์น˜๋งˆํฌ ๊ฒฐ๊ณผ๊ฐ€ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

devngho/llama-ablation-large-korean-corpus

Pretrained using Llama architecture. Trained with about 20.7B tokens(approximately 2.8 epoch), using MaxText.

Checkpoints for every 500 steps are available.

This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC). โšก

Details

  • Made by: devngho
  • Language: ko
  • License: mit

Training details

  • learning_rate: 6e-4 (cosine, initial/end 6e-5)
  • warmup_ratio: 0.05
  • batch_size: 1024(fsdp 16 * per device 8 * ga 8)
  • optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
  • duration: about 27h 50m
  • steps: 10000
  • You can check all the configs and training results on wandb

Training devices

TPU v4-32

Training datasets

I applied deduplication and length filtering to a corpus from AI Hub and Modu Corpus (16,056,320 rows).

I couldn't make the training dataset public because of the terms of AI Hub and Modu Corpus. You can still preprocess the dataset in the same way as the dataset used during training this model using devngho/dataset-preprocess with the raw datas.

Software

jax==0.4.35

devngho/MaxText, a fork of MaxText

Training results

  • learning/loss: 2.6237056255340576
  • eval/avg_loss: 2.6179106279033793

Benchmark graph Benchmark graph Benchmark graph

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including devngho/llama-ablation-large-korean-corpus