Instructions to use devngho/llama-ablation-large-korean-corpus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use devngho/llama-ablation-large-korean-corpus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="devngho/llama-ablation-large-korean-corpus")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("devngho/llama-ablation-large-korean-corpus") model = AutoModelForCausalLM.from_pretrained("devngho/llama-ablation-large-korean-corpus") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use devngho/llama-ablation-large-korean-corpus with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "devngho/llama-ablation-large-korean-corpus" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "devngho/llama-ablation-large-korean-corpus", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/devngho/llama-ablation-large-korean-corpus
- SGLang
How to use devngho/llama-ablation-large-korean-corpus with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "devngho/llama-ablation-large-korean-corpus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "devngho/llama-ablation-large-korean-corpus", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "devngho/llama-ablation-large-korean-corpus" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "devngho/llama-ablation-large-korean-corpus", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use devngho/llama-ablation-large-korean-corpus with Docker Model Runner:
docker model run hf.co/devngho/llama-ablation-large-korean-corpus
devngho/llama-ablation-large-korean-corpus
Llama ์ํคํ ์ณ๋ก pretrain๋ ๋ชจ๋ธ์ ๋๋ค. ์ฝ 20.7B ํ ํฐ์ผ๋ก ์ฝ 2.8์ํฌํฌ ํ์ตํ์ต๋๋ค. MaxText๋ฅผ ํตํด ํ์ต๋์์ต๋๋ค.
500step๋ง๋ค ์ฒดํฌํฌ์ธํธ๊ฐ ์ ๊ณต๋ฉ๋๋ค.
์ด ์ฐ๊ตฌ๋ Google์ TPU Research Cloud (TRC)์ Cloud TPU ์ ๊ณต์ผ๋ก ์ํ๋์์ต๋๋ค. โก
์์
๊ตต์ ๋ถ๋ถ์ด ์ ๋ ฅ์ ๋๋ค.
- max_new_tokens: 500
์์ 1 <s> ์ธ๊ณต์ง๋ฅ์ '์ธ๊ฐ์ ์์ ์ ๋ฅ๋ ฅ์ ์ต๋ํ ๋ฐํํ๋ค'๋ ๊ฒ์ ๋ชฉํ๋ก ํ๋ค. '์ธ๊ฐ์ ์์ ์ ๋ฅ๋ ฅ์ ์ต๋ํ ๋ฐํํ๋ค'๋ ๊ฒ์ '์ธ๊ฐ์ ์์ ์ ๋ฅ๋ ฅ์ ์ต๋ํ ๋ฐํํ๋ค'๋ ๊ฒ์ ์๋ฏธํ๋ค. '์ธ๊ฐ์ ์์ ์ ๋ฅ๋ ฅ์ ์ต๋ํ ๋ฐํํ๋ค'๋ ๊ฒ์ '์ธ๊ฐ์ ์์ ์ ๋ฅ๋ ฅ์ ์ต๋ํ ๋ฐํํ๋ค'๋ ๊ฒ์ ์๋ฏธํ๋ค. '์ธ๊ฐ์ ์์ ์ ๋ฅ๋ ฅ์ ์ต๋ํ ๋ฐํํ๋ค'๋ ๊ฒ์ '์ธ๊ฐ์ ์์ ์ ๋ฅ๋ ฅ์ ์ต๋ํ ๋ฐํํ๋ค'๋ ๊ฒ์ ์๋ฏธํ๋ค</s>
์์ 2 <s> ํ๊ธ์ ํน์ง์ 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์ 'ํ๊ธ'๋ก, 'ํ๊ธ'์
์์ 3 <s> ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง์ฒ๋ผ '์ปคํผ'๋ผ๋ ๋ง์ฒ๋ผ '์ปคํผ'๋ผ๋ ๋ง์ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋ง์ ๋ถ์ฌ๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง์ฒ๋ผ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค. ์ปคํผ๋ '์ปคํผ'๋ผ๋ ๋ง๊ณผ ํจ๊ป '์ปคํผ'๋ผ๋ ๋จ์ด๋ฅผ '์ปคํผ'๋ผ๋ ๋ง๋ก ๋ฐ๊พธ์ด๋์๋ค
์๋นํ ํ๊ฐ๊ณผ ์ด์ํจ, ๋ฐ๋ณต์ด ์์ต๋๋ค.
์์ธ
- ์ ์: devngho
- ์ธ์ด: ko
- ๋ผ์ด์ ์ค: mit
ํ์ต ์์ธ
- learning_rate: 6e-4 (cosine, initial/end 6e-5)
- warmup_ratio: 0.05
- batch_size: 1024(fsdp 16 * per device 8 * ga 8)
- optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
- duration: about 29h 17m
- steps: 10000
- wandb์์ ์ ์ฒด ์ค์ ๊ณผ ๊ฒฐ๊ณผ๋ฅผ ๋ณผ ์ ์์ต๋๋ค.
ํ์ต ์ฅ๋น
TPU v4-32
ํ์ต ๋ฐ์ดํฐ์
AI Hub, ๋ชจ๋์๋ง๋ญ์น๋ฅผ dedup, length filteringํ์ต๋๋ค (์ฝ 16,056,320ํ).
AI Hub, ๋ชจ๋์๋ง๋ญ์น ๊ท์ ์ผ๋ก ์ธํด ๋ฐ์ดํฐ์ ์ ๊ณต๊ฐํ ์ ์์ง๋ง, ์๋ณธ ๋ฐ์ดํฐ๋ฅผ ์ค๋นํ๋ค๋ฉด devngho/dataset-preprocess์ ๊ณผ์ ์ผ๋ก ๋์ผํ๊ฒ ์ ์ฒ๋ฆฌํ ์ ์์ต๋๋ค.
์ํํธ์จ์ด
jax==0.4.35
MaxText๋ฅผ ํฌํฌํ devngho/MaxText
ํ์ต ๊ฒฐ๊ณผ
- learning/loss: 2.6237056255340576
- eval/avg_loss: 2.6179106279033793
์๋์ ๋ฒค์น๋งํฌ ๊ฒฐ๊ณผ๊ฐ ์ ๊ณต๋ฉ๋๋ค.
devngho/llama-ablation-large-korean-corpus
Pretrained using Llama architecture. Trained with about 20.7B tokens(approximately 2.8 epoch), using MaxText.
Checkpoints for every 500 steps are available.
This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC). โก
Details
- Made by: devngho
- Language: ko
- License: mit
Training details
- learning_rate: 6e-4 (cosine, initial/end 6e-5)
- warmup_ratio: 0.05
- batch_size: 1024(fsdp 16 * per device 8 * ga 8)
- optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
- duration: about 27h 50m
- steps: 10000
- You can check all the configs and training results on wandb
Training devices
TPU v4-32
Training datasets
I applied deduplication and length filtering to a corpus from AI Hub and Modu Corpus (16,056,320 rows).
I couldn't make the training dataset public because of the terms of AI Hub and Modu Corpus. You can still preprocess the dataset in the same way as the dataset used during training this model using devngho/dataset-preprocess with the raw datas.
Software
jax==0.4.35
devngho/MaxText, a fork of MaxText
Training results
- learning/loss: 2.6237056255340576
- eval/avg_loss: 2.6179106279033793
- Downloads last month
- -


