Instructions to use PygmalionAI/pygmalion-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PygmalionAI/pygmalion-6b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PygmalionAI/pygmalion-6b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-6b")
model = AutoModelForCausalLM.from_pretrained("PygmalionAI/pygmalion-6b")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use PygmalionAI/pygmalion-6b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PygmalionAI/pygmalion-6b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PygmalionAI/pygmalion-6b

SGLang

How to use PygmalionAI/pygmalion-6b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PygmalionAI/pygmalion-6b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PygmalionAI/pygmalion-6b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PygmalionAI/pygmalion-6b with Docker Model Runner:
```
docker model run hf.co/PygmalionAI/pygmalion-6b
```

Raven (RKWV) as a potential LLM for Pygmalion to use.

#23

by Joseph717171 - opened Apr 3, 2023

Discussion

Joseph717171

Apr 3, 2023

•

edited Apr 3, 2023

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformers - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. And, it's 100% attention-free (You only need the hidden state at position t to compute the state at position t+1 - you can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.).
It looks promising.
Check it out:
https://github.com/BlinkDL/RWKV-LM

https://huggingface.co/spaces/BlinkDL/Raven-RWKV-7B

https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio

Discord: https://discord.gg/bDSBUMeFpc

Sepp Hochreiter, a pioneer in Deep Learning who is known for vanishing gradient and LSTM, had this to say about Raven (RKWV):

Github github.com/BlinkDL/RWKV-LM: RNN with transformer-level performance, without using attention. Similar to Apple's Attention Free Transformer. All trained models open-source. Inference is very fast (even on CPUs) and might work on cell phones.
https://twitter.com/hochreitersepp/status/1524270961314484227?s=46&t=KC7cX_tVezEZLb2ntKap9g

User feedback from Raven (RKWV) GitHub page:

I've so far toyed around the character-based model on our relatively small pre-training dataset (around 10GB of text), and the results are extremely good - similar ppl to models taking much, much longer to train.

dear god rwkv is fast. i switched to another tab after starting training it from scratch & when i returned it was emitting plausible english & maori words, i left to go microwave some coffee & when i came back it was producing fully grammatically correct sentences.

https://github.com/BlinkDL/RWKV-LM

Joseph717171 changed discussion status to closed Apr 3, 2023

Joseph717171 changed discussion status to open Apr 3, 2023

coremic

Apr 30, 2023

Hey @Joseph717171 could you contact us at aicomp#7175 to discuss this approach further as we are pursuing it currently.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment