Instructions to use HuggingFaceH4/starchat-beta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceH4/starchat-beta with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceH4/starchat-beta")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/starchat-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/starchat-beta")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HuggingFaceH4/starchat-beta with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceH4/starchat-beta"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceH4/starchat-beta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceH4/starchat-beta

SGLang

How to use HuggingFaceH4/starchat-beta with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceH4/starchat-beta" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceH4/starchat-beta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceH4/starchat-beta" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceH4/starchat-beta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceH4/starchat-beta with Docker Model Runner:
```
docker model run hf.co/HuggingFaceH4/starchat-beta
```

How to save and load the Peft/LoRA Finetune

#21

by LazerJesus - opened Aug 7, 2023

Discussion

LazerJesus

Aug 7, 2023

I am trying to further finetune Starchat-Beta, save my progress, load my progress, and continue training. But whatever I do, it doesn't come together. Whenever I load my progress and continue training, my loss starts back from zero (3.xxx in my case).
I'll run you through my code and then the problem.

tokenizer = AutoTokenizer.from_pretrained(BASEPATH)
model = AutoModelForCausalLM.from_pretrained(
    "/notebooks/starbaseplus"
    ...
)
# I get both the Tokenizer and the Foundation model from the starbaseplus repo (which I have locally). 

peftconfig = LoraConfig(
    "/notebooks/starchat-beta" 
    base_model_name_or_path = "/notebooks/starbaseplus",
    ...
)
model = get_peft_model(model, peftconfig)
# All Gucci so far, the model and the LoRA fine-tune are loaded from the starchat-beta repo (also local).

# important for later:
print_trainable_parameters(model)
# trainable params: 306 Million || all params: 15 Billion || trainable: 1.971%


trainer = Trainer(
    model=model,
    ...
)
trainer.train()
# I train, loss drops. from 3.xx to 1.xx.

# Now, either I follow the HugginFace docks:
model.save_pretrained("./huggingface_model") 
# -> saves /notebooks/huggingface_model/adapter_model.bin 16mb.

# or an alternative I found on SO:
trainer.save_model("./torch_model") 
# -> saves /notebooks/torch_model/pytorch_model.bin 60gb.

I have two alternatives saved to disk. Lets restart and try either of these approaches

First the huggingface docs approach:
I now have three sets of weights.

the foundation model - starbase plus
the chat finetune - starchat-beta
the 16mb saved bin - adapter_model.bin

But I only have two opportunities to load weights.

AutoModelForCausalLM.from_pretrained
either get_peft_model or PeftModel.from_pretrained

Neither works. training restarts at a loss of 3.x.

Second approach:
Load the 60bg instead of the old starchat-beta repo model.
get_peft_model("/notebooks/torch_model/pytorch_model.bin", peftconfig)

Also doesn't work. The print_trainable_parameters(model) drops to trainable: 0.02% and training restarts at a loss of 3.x

LazerJesus

Aug 9, 2023

There are 4 different ways to save a model.
model.save_pretrained(PATH)
torch.save({'model_state_dict': model.state_dict()})
trainer.save_model(PATH) and
TrainerArgs(save_strategy='steps').

Which one can I use to store the PeftModelForCausalLM(AutoModelForCausalLM()) and how to load it again?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment