Instructions to use byeongal/Ko-DialoGPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use byeongal/Ko-DialoGPT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="byeongal/Ko-DialoGPT") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("byeongal/Ko-DialoGPT") model = AutoModelForCausalLM.from_pretrained("byeongal/Ko-DialoGPT") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use byeongal/Ko-DialoGPT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "byeongal/Ko-DialoGPT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "byeongal/Ko-DialoGPT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/byeongal/Ko-DialoGPT
- SGLang
How to use byeongal/Ko-DialoGPT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "byeongal/Ko-DialoGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "byeongal/Ko-DialoGPT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "byeongal/Ko-DialoGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "byeongal/Ko-DialoGPT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use byeongal/Ko-DialoGPT with Docker Model Runner:
docker model run hf.co/byeongal/Ko-DialoGPT
Ko-DialoGPT
How to use
from transformers import PreTrainedTokenizerFast, GPT2LMHeadModel
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PreTrainedTokenizerFast.from_pretrained('byeongal/Ko-DialoGPT')
model = GPT2LMHeadModel.from_pretrained('byeongal/Ko-DialoGPT').to(device)
past_user_inputs = []
generated_responses = []
while True:
user_input = input(">> User:")
if user_input == 'bye':
break
text_idx = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt')
for i in range(len(generated_responses)-1, len(generated_responses)-3, -1):
if i < 0:
break
encoded_vector = tokenizer.encode(generated_responses[i] + tokenizer.eos_token, return_tensors='pt')
if text_idx.shape[-1] + encoded_vector.shape[-1] < 1000:
text_idx = torch.cat([encoded_vector, text_idx], dim=-1)
else:
break
encoded_vector = tokenizer.encode(past_user_inputs[i] + tokenizer.eos_token, return_tensors='pt')
if text_idx.shape[-1] + encoded_vector.shape[-1] < 1000:
text_idx = torch.cat([encoded_vector, text_idx], dim=-1)
else:
break
text_idx = text_idx.to(device)
inference_output = model.generate(
text_idx,
max_length=1000,
num_beams=5,
top_k=20,
no_repeat_ngram_size=4,
length_penalty=0.65,
repetition_penalty=2.0,
)
inference_output = inference_output.tolist()
bot_response = tokenizer.decode(inference_output[0][text_idx.shape[-1]:], skip_special_tokens=True)
print(f"Bot: {bot_response}")
past_user_inputs.append(user_input)
generated_responses.append(bot_response)
Reference
- Downloads last month
- 9
docker model run hf.co/byeongal/Ko-DialoGPT