Instructions to use mayaeary/pygmalion-6b_dev-4bit-128g with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mayaeary/pygmalion-6b_dev-4bit-128g with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mayaeary/pygmalion-6b_dev-4bit-128g")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mayaeary/pygmalion-6b_dev-4bit-128g")
model = AutoModelForCausalLM.from_pretrained("mayaeary/pygmalion-6b_dev-4bit-128g")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mayaeary/pygmalion-6b_dev-4bit-128g with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mayaeary/pygmalion-6b_dev-4bit-128g"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mayaeary/pygmalion-6b_dev-4bit-128g",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mayaeary/pygmalion-6b_dev-4bit-128g

SGLang

How to use mayaeary/pygmalion-6b_dev-4bit-128g with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mayaeary/pygmalion-6b_dev-4bit-128g" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mayaeary/pygmalion-6b_dev-4bit-128g",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mayaeary/pygmalion-6b_dev-4bit-128g" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mayaeary/pygmalion-6b_dev-4bit-128g",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use mayaeary/pygmalion-6b_dev-4bit-128g with Docker Model Runner:
```
docker model run hf.co/mayaeary/pygmalion-6b_dev-4bit-128g
```

How to run this with Huggingface transformers library?

by adikhad - opened Apr 22, 2023

Discussion

adikhad

Apr 22, 2023

I got the same error as one of the issues

mayaeary/pygmalion-6b_dev-4bit-128g does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

I'm assuming I probably need to load a GPT neoX 6B model, then change the weights to your .safetensors file..
I' really not sure how tho. pls help, need to make anime wife.

mayaeary

Owner Apr 23, 2023

What GUI are you using?

adi19973010

May 1, 2023

i have an error when i try to install it - ModuleNotFoundError: No module named 'llama_inference_offload'

adikhad

May 1, 2023

i have an error when i try to install it - ModuleNotFoundError: No module named 'llama_inference_offload'

Pygmalion updated their models to use Llama as the base, I think this repo might need to be refactored to accommodate that change.

MankingJr

Jul 29, 2023

having the same error. Im currently using Oogabooga webui, how do i fix it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment