Instructions to use HuggingFaceH4/starchat-beta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceH4/starchat-beta with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceH4/starchat-beta")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/starchat-beta") model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/starchat-beta") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceH4/starchat-beta with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceH4/starchat-beta" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/starchat-beta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceH4/starchat-beta
- SGLang
How to use HuggingFaceH4/starchat-beta with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceH4/starchat-beta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/starchat-beta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceH4/starchat-beta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/starchat-beta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceH4/starchat-beta with Docker Model Runner:
docker model run hf.co/HuggingFaceH4/starchat-beta
How to save and load the Peft/LoRA Finetune
I am trying to further finetune Starchat-Beta, save my progress, load my progress, and continue training. But whatever I do, it doesn't come together. Whenever I load my progress and continue training, my loss starts back from zero (3.xxx in my case).
I'll run you through my code and then the problem.
tokenizer = AutoTokenizer.from_pretrained(BASEPATH)
model = AutoModelForCausalLM.from_pretrained(
"/notebooks/starbaseplus"
...
)
# I get both the Tokenizer and the Foundation model from the starbaseplus repo (which I have locally).
peftconfig = LoraConfig(
"/notebooks/starchat-beta"
base_model_name_or_path = "/notebooks/starbaseplus",
...
)
model = get_peft_model(model, peftconfig)
# All Gucci so far, the model and the LoRA fine-tune are loaded from the starchat-beta repo (also local).
# important for later:
print_trainable_parameters(model)
# trainable params: 306 Million || all params: 15 Billion || trainable: 1.971%
trainer = Trainer(
model=model,
...
)
trainer.train()
# I train, loss drops. from 3.xx to 1.xx.
# Now, either I follow the HugginFace docks:
model.save_pretrained("./huggingface_model")
# -> saves /notebooks/huggingface_model/adapter_model.bin 16mb.
# or an alternative I found on SO:
trainer.save_model("./torch_model")
# -> saves /notebooks/torch_model/pytorch_model.bin 60gb.
I have two alternatives saved to disk. Lets restart and try either of these approaches
First the huggingface docs approach:
I now have three sets of weights.
- the foundation model - starbase plus
- the chat finetune - starchat-beta
- the 16mb saved bin - adapter_model.bin
But I only have two opportunities to load weights.
AutoModelForCausalLM.from_pretrained- either
get_peft_modelorPeftModel.from_pretrained
Neither works. training restarts at a loss of 3.x.
Second approach:
Load the 60bg instead of the old starchat-beta repo model.get_peft_model("/notebooks/torch_model/pytorch_model.bin", peftconfig)
Also doesn't work. The print_trainable_parameters(model) drops to trainable: 0.02% and training restarts at a loss of 3.x
There are 4 different ways to save a model.model.save_pretrained(PATH)torch.save({'model_state_dict': model.state_dict()})trainer.save_model(PATH) andTrainerArgs(save_strategy='steps').
Which one can I use to store the PeftModelForCausalLM(AutoModelForCausalLM()) and how to load it again?