Instructions to use nazlicanto/phi-2-persona-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nazlicanto/phi-2-persona-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nazlicanto/phi-2-persona-chat", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("nazlicanto/phi-2-persona-chat", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nazlicanto/phi-2-persona-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nazlicanto/phi-2-persona-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nazlicanto/phi-2-persona-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/nazlicanto/phi-2-persona-chat
- SGLang
How to use nazlicanto/phi-2-persona-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nazlicanto/phi-2-persona-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nazlicanto/phi-2-persona-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nazlicanto/phi-2-persona-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nazlicanto/phi-2-persona-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use nazlicanto/phi-2-persona-chat with Docker Model Runner:
docker model run hf.co/nazlicanto/phi-2-persona-chat
Phi 2 Persona-Chat
Phi 2 Persona-Chat is a LoRA fine-tuned version of the base Phi 2 model using the nazlicanto/persona-based-chat dataset. This dataset consists of over 64k conversations between Persona A and Persona B, for which a list of persona facts are provided.
The model is trained using Supervised Fine-tuning Trainer using the reference responses as target outputs. For the training and inference code and the full list of dependencies, you can refer to the Github repo.
Running the Model
Please note that, at the moment, trust_remote_code=True is required for running the Phi 2 model. For best results, use a prompt that includes the persona facts, followed by a minimum of one conversational turn.
from random import randrange
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
prompt = f"""
Person B has the following Persona information.
Persona of Person B: My name is David and I'm a 35 year old math teacher.
Persona of Person B: I like to hike and spend time in the nature.
Persona of Person B: I'm married with two kids.
Instruct: Person A and Person B are now having a conversation. Following the conversation below, write a response that Person B would say base on the above Persona information. Please carefully consider the flow and context of the conversation below, and use the Person B's Persona information appropriately to generate a response that you think are the most appropriate replying for Person B.
Persona A: Morning! I think I saw you at the parent meeting, what's your name?
Output:
"""
# load base LLM model, LoRA params and tokenizer
model = AutoModelForCausalLM.from_pretrained("nazlicanto/phi-2-persona-chat", trust_remote_code=True)
model.to("cuda")
tokenizer = AutoTokenizer.from_pretrained("nazlicanto/phi-2-persona-chat", trust_remote_code=True)
# tokenize input prompt
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# inference
with torch.inference_mode():
outputs = model.generate(
input_ids=input_ids,
max_new_tokens=50,
do_sample=True,
top_p=0.1,
temperature=0.7
)
# decode output tokens
outputs = outputs.detach().cpu().numpy()
outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
output = outputs[0][len(prompt):]
print(output)
This model is trained by nazlicanto and adirik.
- Downloads last month
- 17