Instructions to use devngho/llama-ablation-large-korean-corpus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use devngho/llama-ablation-large-korean-corpus with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="devngho/llama-ablation-large-korean-corpus")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("devngho/llama-ablation-large-korean-corpus")
model = AutoModelForCausalLM.from_pretrained("devngho/llama-ablation-large-korean-corpus")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use devngho/llama-ablation-large-korean-corpus with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "devngho/llama-ablation-large-korean-corpus"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "devngho/llama-ablation-large-korean-corpus",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/devngho/llama-ablation-large-korean-corpus

SGLang

How to use devngho/llama-ablation-large-korean-corpus with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "devngho/llama-ablation-large-korean-corpus" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "devngho/llama-ablation-large-korean-corpus",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "devngho/llama-ablation-large-korean-corpus" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "devngho/llama-ablation-large-korean-corpus",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use devngho/llama-ablation-large-korean-corpus with Docker Model Runner:
```
docker model run hf.co/devngho/llama-ablation-large-korean-corpus
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

devngho/llama-ablation-large-korean-corpus

Llama 아키텍쳐로 pretrain된 모델입니다. 약 20.7B 토큰으로 약 2.8에포크 학습했습니다. MaxText를 통해 학습되었습니다.

500step마다 체크포인트가 제공됩니다.

이 연구는 Google의 TPU Research Cloud (TRC)의 Cloud TPU 제공으로 수행되었습니다. ⚡

예시

굵은 부분이 입력입니다.

max_new_tokens: 500

예시 1 <s> 인공지능은 '인간은 자신의 능력을 최대한 발휘한다'는 것을 목표로 한다. '인간은 자신의 능력을 최대한 발휘한다'는 것은 '인간은 자신의 능력을 최대한 발휘한다'는 것을 의미한다. '인간은 자신의 능력을 최대한 발휘한다'는 것은 '인간은 자신의 능력을 최대한 발휘한다'는 것을 의미한다. '인간은 자신의 능력을 최대한 발휘한다'는 것은 '인간은 자신의 능력을 최대한 발휘한다'는 것을 의미한다</s>

예시 2 <s> 한글의 특징은 '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을 '한글'로, '한글'을

예시 3 <s> 커피는 '커피'라는 말처럼 '커피'라는 말처럼 '커피'라는 말은 '커피'라는 말과 함께 '커피'라는 말을 붙여놓았다. 커피는 '커피'라는 말처럼 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다. 커피는 '커피'라는 말과 함께 '커피'라는 단어를 '커피'라는 말로 바꾸어놓았다

상당한 환각과 어색함, 반복이 있습니다.

상세

제작: devngho
언어: ko
라이선스: mit

학습 상세

learning_rate: 6e-4 (cosine, initial/end 6e-5)
warmup_ratio: 0.05
batch_size: 1024(fsdp 16 * per device 8 * ga 8)
optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
duration: about 29h 17m
steps: 10000
wandb에서 전체 설정과 결과를 볼 수 있습니다.

학습 장비

TPU v4-32

학습 데이터셋

AI Hub, 모두의말뭉치를 dedup, length filtering했습니다 (약 16,056,320행).

AI Hub, 모두의말뭉치 규정으로 인해 데이터셋을 공개할 수 없지만, 원본 데이터를 준비한다면 devngho/dataset-preprocess의 과정으로 동일하게 전처리할 수 있습니다.

소프트웨어

jax==0.4.35

MaxText를 포크한 devngho/MaxText

학습 결과

learning/loss: 2.6237056255340576
eval/avg_loss: 2.6179106279033793

아래에 벤치마크 결과가 제공됩니다.

devngho/llama-ablation-large-korean-corpus

Pretrained using Llama architecture. Trained with about 20.7B tokens(approximately 2.8 epoch), using MaxText.

Checkpoints for every 500 steps are available.

This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC). ⚡

Details

Made by: devngho
Language: ko
License: mit

Training details

learning_rate: 6e-4 (cosine, initial/end 6e-5)
warmup_ratio: 0.05
batch_size: 1024(fsdp 16 * per device 8 * ga 8)
optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
duration: about 27h 50m
steps: 10000
You can check all the configs and training results on wandb

Training devices

TPU v4-32

Training datasets

I applied deduplication and length filtering to a corpus from AI Hub and Modu Corpus (16,056,320 rows).

I couldn't make the training dataset public because of the terms of AI Hub and Modu Corpus. You can still preprocess the dataset in the same way as the dataset used during training this model using devngho/dataset-preprocess with the raw datas.