Instructions to use CarperAI/stable-vicuna-13b-delta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CarperAI/stable-vicuna-13b-delta with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CarperAI/stable-vicuna-13b-delta")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CarperAI/stable-vicuna-13b-delta")
model = AutoModelForCausalLM.from_pretrained("CarperAI/stable-vicuna-13b-delta")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use CarperAI/stable-vicuna-13b-delta with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CarperAI/stable-vicuna-13b-delta"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CarperAI/stable-vicuna-13b-delta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/CarperAI/stable-vicuna-13b-delta

SGLang

How to use CarperAI/stable-vicuna-13b-delta with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CarperAI/stable-vicuna-13b-delta" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CarperAI/stable-vicuna-13b-delta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CarperAI/stable-vicuna-13b-delta" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CarperAI/stable-vicuna-13b-delta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use CarperAI/stable-vicuna-13b-delta with Docker Model Runner:
```
docker model run hf.co/CarperAI/stable-vicuna-13b-delta
```

Have you considered using the Vicuna v1.1 version for training?

by QuantumBolt - opened Apr 30, 2023

Discussion

QuantumBolt

Apr 30, 2023

•

edited Apr 30, 2023

Vicuna has released a new version v1.1 and it performs better than the v0 version. And training on Vicuna v1.1 may provide better performance.

Major updates of weights v1.1

Refactor the tokenization and separator. In Vicuna v1.1, the separator has been changed from "###" to the EOS token "". This change makes it easier to determine the generation stop criteria and enables better compatibility with other libraries.

Fix the supervised fine-tuning loss computation for better model quality.

Also seen at:
https://huggingface.co/lmsys/vicuna-7b-delta-v1.1#major-updates-of-weights-v11

LouisStability

CarperAI org May 2, 2023

We're rapidly improving StableVicuna. A new version is on the horizon. We're already internally testing it at Carper.

LouisStability changed discussion status to closed May 2, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment