Text Generation
Transformers
Safetensors
English
Korean
llama
Llama3
meta
Easy Systems
conversational
text-generation-inference
Instructions to use Easy-Systems/easy-ko-Llama3-8b-Instruct-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Easy-Systems/easy-ko-Llama3-8b-Instruct-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Easy-Systems/easy-ko-Llama3-8b-Instruct-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Easy-Systems/easy-ko-Llama3-8b-Instruct-v1") model = AutoModelForCausalLM.from_pretrained("Easy-Systems/easy-ko-Llama3-8b-Instruct-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Easy-Systems/easy-ko-Llama3-8b-Instruct-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Easy-Systems/easy-ko-Llama3-8b-Instruct-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Easy-Systems/easy-ko-Llama3-8b-Instruct-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Easy-Systems/easy-ko-Llama3-8b-Instruct-v1
- SGLang
How to use Easy-Systems/easy-ko-Llama3-8b-Instruct-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Easy-Systems/easy-ko-Llama3-8b-Instruct-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Easy-Systems/easy-ko-Llama3-8b-Instruct-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Easy-Systems/easy-ko-Llama3-8b-Instruct-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Easy-Systems/easy-ko-Llama3-8b-Instruct-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Easy-Systems/easy-ko-Llama3-8b-Instruct-v1 with Docker Model Runner:
docker model run hf.co/Easy-Systems/easy-ko-Llama3-8b-Instruct-v1
Easy-Systems/easy-ko-Llama3-8b-Instruct-v1
DALL-Eλ‘ μμ±ν μ΄λ―Έμ§μ
λλ€.
- (μ£Ό)μ΄μ§μμ€ν μ 첫λ²μ§Έ LLM λͺ¨λΈμΈ easy-ko-Llama3-8b-Instruct-v1μ μμ΄ κΈ°λ° λͺ¨λΈμΈ meta-llama/Meta-Llama-3-8B-Instructλ₯Ό λ² μ΄μ€λ‘ νμ¬ νκ΅μ΄ νμΈνλ λ λͺ¨λΈμ λλ€.
- LLM λͺ¨λΈμ μΆν μ§μμ μΌλ‘ μ λ°μ΄νΈ λ μμ μ λλ€.
Data
- AI hub (https://www.aihub.or.kr/) λ°μ΄ν°λ₯Ό λ€μν Task (QA, Summary, Translate λ±)λ‘ κ°κ³΅νμ¬ νμΈνλμ μ¬μ©.
- μ¬λ΄ μ체 κ°κ³΅ν λ°μ΄ν°λ₯Ό νμ©νμ¬ νμΈνλμ μ¬μ©.
How to use
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Easy-Systems/easy-ko-Llama3-8b-Instruct-v1"
model = AutoModelForCausalLM.from_pretrained(model_id,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id, add_special_tokens=True)
prompt="리λ
μ€ νλ‘μΈμ€λ₯Ό κ°μ λ‘ μ’
λ£νλ λ°©λ²μ?"
messages = [
{"role": "system", "content": "λΉμ μ μΉμ ν AI chatbot μ
λλ€. μμ²μ λν΄μ step-by-step μΌλ‘ κ°κ²°νκ² νκ΅μ΄(Korean)λ‘ λ΅λ³ν΄μ£ΌμΈμ."},
{"role": "user", "content": f"\n\n### λͺ
λ Ήμ΄: {prompt}\n\n### μλ΅:"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=1024,
eos_token_id=terminators,
pad_token_id=tokenizer.eos_token_id,
do_sample=True,
temperature=0.2,
repetition_penalty = 1.3,
top_p=0.9,
top_k=10,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True).strip())
Example Output
리λ
μ€μ κ²½μ°, `kill` λλ `pkill` λͺ
λ Ήμ μ¬μ©νμ¬ νΉμ νλ‘μΈμ€λ₯Ό κ°μ λ‘ μ’
λ£ν μ μμ΅λλ€.
1λ¨κ³: ps -ef | grep <νλ‘μΈμ€_μ΄λ¦>`μΌλ‘ νμ¬ μ€ν μ€μΈ λͺ¨λ νλ‘μΈμ€κ° νμλ©λλ€.
2λ¨κ³: kill <νλ‘μΈμ€_ID>`λ₯Ό μ
λ ₯νλ©΄ ν΄λΉ νλ‘μΈμ€κ° μ¦μ μ’
λ£λ©λλ€.
λλ `-9`(SIGKILL μ νΈ)λ₯Ό μ§μ νμ¬ νλ‘μΈμ€λ₯Ό κ°μ λ‘ μ’
λ£νλλ‘ ν μλ μμΌλ©°, μ΄λ μ΄μ 체μ μμ μ μμ μΌλ‘ μ’
λ£νκΈ° μ μ λ§μ§λ§ κΈ°νλ₯Ό μ£Όμ§ μκ³ λ°λ‘ μ£½κ² λ©λλ€:
3λ¨κ³: kill -9 <νλ‘μΈμ€_ID>`λ₯Ό μ
λ ₯ν©λλ€.
μ°Έκ³ λ‘, μμ€ν
μ μμ μ μν΄ νμν νμΌμ΄λ μλΉμ€κ° μλ κ²½μ°μλ μ§μ μμ νμ§ λ§μμΌ νλ©°, μ μ ν κΆνκ³Ό μ§μμ λ°λΌ μ²λ¦¬ν΄μΌ ν©λλ€. λν μΌλΆ νλ‘κ·Έλ¨λ€μ κ°μ μ’
λ£ μ λ°μ΄ν° μμ€ λ±μ λ¬Έμ κ° λ°μν κ°λ₯μ±μ΄ μμΌλ―λ‘ λ―Έλ¦¬ μ μ₯λ μμ
λ΄μ© λ±μ νμΈνκ³ μ’
λ£νμκΈ° λ°λλλ€.
License
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC-BY-NC-SA-4.0)
- μμ μ μ¬μ© μ, μλμ μ°λ½μ²λ‘ λ¬Έμν΄μ£ΌμκΈ° λ°λλλ€.
Contact
- μμ μ μ¬μ© λλ κΈ°ν λ¬Έμ μ¬νμ λνμ¬ μ°λ½νμλ €λ©΄ λ€μ μ΄λ©μΌλ‘ μ°λ½ μ£Όμμμ€.
- κ°νꡬ: hkkang@easy.co.kr
- Downloads last month
- 14