Instructions to use CallComply/Starling-LM-11B-alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CallComply/Starling-LM-11B-alpha with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CallComply/Starling-LM-11B-alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CallComply/Starling-LM-11B-alpha")
model = AutoModelForCausalLM.from_pretrained("CallComply/Starling-LM-11B-alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CallComply/Starling-LM-11B-alpha with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CallComply/Starling-LM-11B-alpha"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CallComply/Starling-LM-11B-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CallComply/Starling-LM-11B-alpha

SGLang

How to use CallComply/Starling-LM-11B-alpha with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CallComply/Starling-LM-11B-alpha" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CallComply/Starling-LM-11B-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CallComply/Starling-LM-11B-alpha" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CallComply/Starling-LM-11B-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CallComply/Starling-LM-11B-alpha with Docker Model Runner:
```
docker model run hf.co/CallComply/Starling-LM-11B-alpha
```

Model performance and more questions

by agershun - opened Dec 10, 2023

Discussion

agershun

Dec 10, 2023

•

edited Dec 10, 2023

Interesting fact: I tried to compare this model NurtureAI/Starling-LM-11B-alpha with the current leader of the board MetaMath-Cybertron-Starling on today, and 11B gives better and more relevant results on my queries. Probably MetaMath was overtrained to pass the tests, rather than to be more "useful". Thank you again.

May I ask you some more questions?:

Why did you use this strange merging configuration of layers?
Have you tried to merge other layer configurations?

perlthoughts

Dec 10, 2023

•

edited Dec 10, 2023

same as you did i saw better generations. I made more 11bs on Nurtureai. My thoughts are that 11b will perform better once finetuned with dpo or sft with new layers.

agershun

Dec 12, 2023

Ray, may I ask you couple more questions:

How long does it take to merge the layers with the mergekit? Does this process requires a GPU or it can be done with the CPU only?
Have you already tried to use this "11B" method with "new champions" (on December 12, 2023) like v1olet/v1olet_marcoroni-go-bruins-merge-7B or with new "base model" mistralai/Mistral-7B-Instruct-v0.2?

Thank you

perlthoughts

Dec 12, 2023

•

edited Dec 12, 2023

it doesn't take long at all, for a 7b to 11b just a couple of minutes. I just did the mistral v0.2 for you. I also included the merge script for mergekit on the model card.

agershun

Dec 12, 2023

And I tested it, and it works perfect (in my case)! )) Thank you!

Probably this "11B" approach look promising. It is interesting: will it work with Mixtral 8x7B?
Probably, it is necessary to be more careful with layers..

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment