Instructions to use CallComply/Starling-LM-11B-alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CallComply/Starling-LM-11B-alpha with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="CallComply/Starling-LM-11B-alpha") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CallComply/Starling-LM-11B-alpha") model = AutoModelForCausalLM.from_pretrained("CallComply/Starling-LM-11B-alpha") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CallComply/Starling-LM-11B-alpha with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CallComply/Starling-LM-11B-alpha" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CallComply/Starling-LM-11B-alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/CallComply/Starling-LM-11B-alpha
- SGLang
How to use CallComply/Starling-LM-11B-alpha with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CallComply/Starling-LM-11B-alpha" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CallComply/Starling-LM-11B-alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CallComply/Starling-LM-11B-alpha" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CallComply/Starling-LM-11B-alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use CallComply/Starling-LM-11B-alpha with Docker Model Runner:
docker model run hf.co/CallComply/Starling-LM-11B-alpha
Model performance and more questions
Interesting fact: I tried to compare this model NurtureAI/Starling-LM-11B-alpha with the current leader of the board MetaMath-Cybertron-Starling on today, and 11B gives better and more relevant results on my queries. Probably MetaMath was overtrained to pass the tests, rather than to be more "useful". Thank you again.
May I ask you some more questions?:
- Why did you use this strange merging configuration of layers?
- Have you tried to merge other layer configurations?
same as you did i saw better generations. I made more 11bs on Nurtureai. My thoughts are that 11b will perform better once finetuned with dpo or sft with new layers.
Ray, may I ask you couple more questions:
- How long does it take to merge the layers with the mergekit? Does this process requires a GPU or it can be done with the CPU only?
- Have you already tried to use this "11B" method with "new champions" (on December 12, 2023) like v1olet/v1olet_marcoroni-go-bruins-merge-7B or with new "base model" mistralai/Mistral-7B-Instruct-v0.2?
Thank you
it doesn't take long at all, for a 7b to 11b just a couple of minutes. I just did the mistral v0.2 for you. I also included the merge script for mergekit on the model card.
And I tested it, and it works perfect (in my case)! )) Thank you!
Probably this "11B" approach look promising. It is interesting: will it work with Mixtral 8x7B?
Probably, it is necessary to be more careful with layers..