Instructions to use MrBananaHuman/kogpt_6b_fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MrBananaHuman/kogpt_6b_fp16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MrBananaHuman/kogpt_6b_fp16")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MrBananaHuman/kogpt_6b_fp16") model = AutoModelForCausalLM.from_pretrained("MrBananaHuman/kogpt_6b_fp16") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MrBananaHuman/kogpt_6b_fp16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MrBananaHuman/kogpt_6b_fp16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MrBananaHuman/kogpt_6b_fp16", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/MrBananaHuman/kogpt_6b_fp16
- SGLang
How to use MrBananaHuman/kogpt_6b_fp16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MrBananaHuman/kogpt_6b_fp16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MrBananaHuman/kogpt_6b_fp16", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MrBananaHuman/kogpt_6b_fp16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MrBananaHuman/kogpt_6b_fp16", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use MrBananaHuman/kogpt_6b_fp16 with Docker Model Runner:
docker model run hf.co/MrBananaHuman/kogpt_6b_fp16
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
kakao brain์์ ๊ณต๊ฐํ kogpt 6b model('kakaobrain/kogpt')์ fp16์ผ๋ก ์ ์ฅํ ๋ชจ๋ธ์ ๋๋ค.
์นด์นด์ค๋ธ๋ ์ธ ๋ชจ๋ธ์ fp16์ผ๋ก ๋ก๋ํ๋ ๋ฐฉ๋ฒ
import torch
from transformers import GPTJForCausalLM
model = GPTJForCausalLM.from_pretrained('kakaobrain/kogpt', cache_dir='./my_dir', revision='KoGPT6B-ryan1.5b', torch_dtype=torch.float16)
fp16 ๋ชจ๋ธ ๋ก๋ ํ ๋ฌธ์ฅ ์์ฑ
import torch
from transformers import GPTJForCausalLM, AutoTokenizer
model = GPTJForCausalLM.from_pretrained('MrBananaHuman/kogpt_6b_fp16', low_cpu_mem_usage=True))
model.to('cuda')
tokenizer = AutoTokenizer.from_pretrained('MrBananaHuman/kogpt_6b_fp16')
input_text = '์ด์์ ์'
input_ids = tokenizer(input_text, return_tensors='pt').input_ids.to('cuda')
output = model.generate(input_ids, max_length=64)
print(tokenizer.decode(output[0]))
>>> ์ด์์ ์ ์ฐ๋ฆฌ์๊ฒ ๋ฌด์์ธ๊ฐ? 1. ๋จธ๋ฆฌ๋ง ์ด๊ธ์ ์์ง์๋ ๋น์ ์ด์์ธ์ด ๋ณด์ฌ์ค
์ฐธ๊ณ ๋งํฌ
- Downloads last month
- 10