Instructions to use nazlicanto/phi-2-persona-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nazlicanto/phi-2-persona-chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nazlicanto/phi-2-persona-chat", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("nazlicanto/phi-2-persona-chat", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nazlicanto/phi-2-persona-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nazlicanto/phi-2-persona-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nazlicanto/phi-2-persona-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/nazlicanto/phi-2-persona-chat

SGLang

How to use nazlicanto/phi-2-persona-chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nazlicanto/phi-2-persona-chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nazlicanto/phi-2-persona-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nazlicanto/phi-2-persona-chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nazlicanto/phi-2-persona-chat",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use nazlicanto/phi-2-persona-chat with Docker Model Runner:
```
docker model run hf.co/nazlicanto/phi-2-persona-chat
```

Phi 2 Persona-Chat

Phi 2 Persona-Chat is a LoRA fine-tuned version of the base Phi 2 model using the nazlicanto/persona-based-chat dataset. This dataset consists of over 64k conversations between Persona A and Persona B, for which a list of persona facts are provided.

The model is trained using Supervised Fine-tuning Trainer using the reference responses as target outputs. For the training and inference code and the full list of dependencies, you can refer to the Github repo.

Running the Model

Please note that, at the moment, trust_remote_code=True is required for running the Phi 2 model. For best results, use a prompt that includes the persona facts, followed by a minimum of one conversational turn.

from random import randrange

import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM


prompt = f"""
Person B has the following Persona information.

Persona of Person B: My name is David and I'm a 35 year old math teacher.
Persona of Person B: I like to hike and spend time in the nature.
Persona of Person B: I'm married with two kids.

Instruct: Person A and Person B are now having a conversation.  Following the conversation below, write a response that Person B would say base on the above Persona information. Please carefully consider the flow and context of the conversation below, and use the Person B's Persona information appropriately to generate a response that you think are the most appropriate replying for Person B.

Persona A: Morning! I think I saw you at the parent meeting, what's your name?

Output:
"""

# load base LLM model, LoRA params and tokenizer
model = AutoModelForCausalLM.from_pretrained("nazlicanto/phi-2-persona-chat", trust_remote_code=True)
model.to("cuda")

tokenizer = AutoTokenizer.from_pretrained("nazlicanto/phi-2-persona-chat", trust_remote_code=True)

# tokenize input prompt
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()

# inference
with torch.inference_mode():
    outputs = model.generate(
        input_ids=input_ids, 
        max_new_tokens=50, 
        do_sample=True, 
        top_p=0.1,
        temperature=0.7
    )

# decode output tokens
outputs = outputs.detach().cpu().numpy()
outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
output = outputs[0][len(prompt):]
print(output)

This model is trained by nazlicanto and adirik.

Downloads last month: 5

Safetensors

Model size

3B params

Tensor type

F32

nazlicanto
/

phi-2-persona-chat

Phi 2 Persona-Chat

Running the Model

Dataset used to train nazlicanto/phi-2-persona-chat