Instructions to use davidkim205/Ko-Llama-3-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use davidkim205/Ko-Llama-3-8B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="davidkim205/Ko-Llama-3-8B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("davidkim205/Ko-Llama-3-8B-Instruct") model = AutoModelForCausalLM.from_pretrained("davidkim205/Ko-Llama-3-8B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use davidkim205/Ko-Llama-3-8B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "davidkim205/Ko-Llama-3-8B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davidkim205/Ko-Llama-3-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/davidkim205/Ko-Llama-3-8B-Instruct
- SGLang
How to use davidkim205/Ko-Llama-3-8B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "davidkim205/Ko-Llama-3-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davidkim205/Ko-Llama-3-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "davidkim205/Ko-Llama-3-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davidkim205/Ko-Llama-3-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use davidkim205/Ko-Llama-3-8B-Instruct with Docker Model Runner:
docker model run hf.co/davidkim205/Ko-Llama-3-8B-Instruct
davidkim205/Ko-Llama-3-8B-Instruct
Ko-Llama-3-8B-Instruct is one of several models being researched to improve the performance of Korean language models. This model was created using the REJECTION SAMPLING technique to create a data set and then trained using Supervised Fine Tuning.
Model Details
- Model Developers : davidkim(changyeon kim)
- Repository : -
- base mode : meta-llama/Meta-Llama-3-8B-Instruct
- sft dataset : sft_rs_140k
Requirements
If the undefined symbol error below occurs, install torch as follows.
...
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/david/anaconda3/envs/spaces/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
pip install torch==2.2.0
pip install flash-attn==2.5.9.post1
How to use
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "davidkim205/Ko-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
while True:
prompt = input('>')
messages = [
{"role": "system", "content": "λΉμ μ ꡬ체μ μΌλ‘ λ΅λ³νλ μ±λ΄μ
λλ€."},
{"role": "user", "content": prompt},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=1024,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
μ¬κ³Όμ μλ―Έλ₯Ό μ€λͺ
νμμ€
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
μ¬κ³Όλ μΌλ°μ μΌλ‘ λ§κ³Ό μμκ° μλ κ³ΌμΌλ‘ μλ €μ Έ μμ΅λλ€. μ¬κ³Όλ μ μ ν μνμμ μ£Όλ‘ λ¨Ήκ±°λ, μκ±°νΈλ μ€λ¬΄λ λ±μ μλ£μ νΌν©νμ¬ μμ·¨λκΈ°λ ν©λλ€. λν, μ¬κ³Όλ λ€μν μ’
λ₯κ° μμΌλ©°, κ°κ°μ μ’
λ₯λ λ€λ₯Έ μμκ³Ό λ§μ κ°μ§κ³ μμ΅λλ€.
μ¬κ³Όλ κ³ΌμΌμ΄μ§λ§, μ’
μ’
λ€λ₯Έ μλ―Έλ‘λ μ¬μ©λ©λλ€. μλ₯Ό λ€μ΄, "μ¬κ³Ό"λΌλ λ¨μ΄λ μ΄λ€ κ²μ΄ μλͺ»λκ±°λ λΆμ‘±ν κ²μ μμ¬νλ μν©μμ μ¬μ©λ μλ μμ΅λλ€. μλ₯Ό λ€μ΄, "μ¬κ³Ό"λ₯Ό μ£Όλ κ²μ μλͺ»λ νλμ΄λ λΆμ‘±ν μ¬κ³ λ‘ μΈν μ¬κ³Όλ₯Ό μλ―Έν μ μμ΅λλ€.
λν, "μ¬κ³Ό"λ μ΄λ€ μν©μμ λ€λ₯Έ μ¬λμκ²μ μ¬κ³Όλ₯Ό λ°λ κ²μ μλ―ΈνκΈ°λ ν©λλ€. μλ₯Ό λ€μ΄, "μ¬κ³Ό"λ₯Ό ꡬνμ§ μμΌλ©΄ μ΄λ€ μν©μμ λ€λ₯Έ μ¬λμκ²μ μ¬κ³Όλ₯Ό λ°μ§ λͺ»ν μλ μμ΅λλ€.
λ°λΌμ, "μ¬κ³Ό"λ λ€μν μλ―Έλ‘ μ¬μ©λλ λ¨μ΄μ΄λ©°, λ§₯λ½μ λ°λΌ λ€λ₯Έ μλ―Έλ₯Ό κ°μ§ μ μμ΅λλ€.
Benchmark
kollm_evaluation
https://github.com/davidkim205/kollm_evaluation
| task | acc |
|---|---|
| average | 0.47 |
| kobest | 0.54 |
| kobest_boolq | 0.57 |
| kobest_copa | 0.62 |
| kobest_hellaswag | 0.42 |
| kobest_sentineg | 0.57 |
| kobest_wic | 0.49 |
| ko_truthfulqa | 0.29 |
| ko_mmlu | 0.34 |
| ko_hellaswag | 0.36 |
| ko_common_gen | 0.76 |
| ko_arc_easy | 0.33 |
Evaluation of KEval
keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.
https://huggingface.co/davidkim205/keval-7b
| keval | average | kullm | logickor | wandb |
|---|---|---|---|---|
| openai/gpt-4 | 6.79 | 4.66 | 8.51 | 7.21 |
| openai/gpt-3.5-turbo | 6.25 | 4.48 | 7.29 | 6.99 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.59 | 4.24 | 6.46 | 6.06 |
Evaluation of ChatGPT
| chatgpt | average | kullm | logickor | wandb |
|---|---|---|---|---|
| openai/gpt-4 | 7.30 | 4.57 | 8.76 | 8.57 |
| openai/gpt-3.5-turbo | 6.53 | 4.26 | 7.5 | 7.82 |
| davidkim205/Ko-Llama-3-8B-Instruct | 5.45 | 4.22 | 6.49 | 5.64 |
- Downloads last month
- 10