Instructions to use berkeley-nest/Starling-LM-7B-alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use berkeley-nest/Starling-LM-7B-alpha with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="berkeley-nest/Starling-LM-7B-alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")
model = AutoModelForCausalLM.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use berkeley-nest/Starling-LM-7B-alpha with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "berkeley-nest/Starling-LM-7B-alpha"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "berkeley-nest/Starling-LM-7B-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/berkeley-nest/Starling-LM-7B-alpha

SGLang

How to use berkeley-nest/Starling-LM-7B-alpha with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "berkeley-nest/Starling-LM-7B-alpha" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "berkeley-nest/Starling-LM-7B-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "berkeley-nest/Starling-LM-7B-alpha" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "berkeley-nest/Starling-LM-7B-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use berkeley-nest/Starling-LM-7B-alpha with Docker Model Runner:
```
docker model run hf.co/berkeley-nest/Starling-LM-7B-alpha
```

Fine tuning with transformers?

#17

by toranb - opened Dec 2, 2023

Discussion

toranb

Dec 2, 2023

•

edited Dec 2, 2023

I've had luck fine tuning Zephyr and other fine tunes of Mistral with transformers 4.35.2 but Starling throws an error related to vocab mismatch (likely because I'm using Mistral)

shape '[-1, 32000]' is invalid for input of size 19457216

this originates from transformers/models/mistral/modeling_mistral.py line 1032 for those interested.
shift_logits.view(-1, self.config.vocab_size)

Does anyone know of a workaround until we have 1st class support in transformers?

MB7977

Dec 3, 2023

For what it’s worth I’ve successfully done a full fine tune on Starling with Transformers 4.35.2 (with Axolotl). Are you perhaps adding tokens and changing the vocab size? I trained with the OpenChat prompt format and stuck with the default EOS, BOS tokens etc so no added tokens were necessary. I think the openchat.json file may also be relevant?

toranb

Dec 3, 2023

When do you use the openchat.json file? I didn't even pull that down ahead of converting weights/fine tuning so I'm curious to learn more

https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/blob/main/openchat.json

MB7977

Dec 4, 2023

I may be wrong about that. It was one of a couple of files added after the fact that seemed to fix early training issues, but that one seems more related to compatibility with the OpenChat API. I'm working with Transformers indirectly, via Axolotl, so it's difficult to tease out why it's working in my instance versus yours. The OpenChat 3.5 format used by Starling adds a couple of tokens to the vocabulary that I suspect are the source of your issues. Hopefully the devs can help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment