Instructions to use bigcode/starcoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigcode/starcoder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigcode/starcoder")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder") model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigcode/starcoder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigcode/starcoder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigcode/starcoder
- SGLang
How to use bigcode/starcoder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigcode/starcoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigcode/starcoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigcode/starcoder with Docker Model Runner:
docker model run hf.co/bigcode/starcoder
Errors when deploying to AWS Sagemaker
I got an error when deploying this model to AWS Sagemaker.
"No safetensors weights found for model bigcode/starcoder at revision None. Converting PyTorch weights to safetensors."
It seems Sagemaker expects one bin file "model.pth" or "pytorch_model.bin"
but this repo has many bin files like "pytorch_model-00003-of-00007.bin" etc..
I don't think I can simply contact those bin files.
Anyone has encountered this issue?
I also faced, don't know how to solve it
I passed this error.
Sagemaker will actually do the conversion for you. But you need to give it more time.
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.8xlarge",
container_startup_health_check_timeout=1200,
)
Set up the container_startup_health_check_timeout to a bigger number and it will pass this error.
But I encountered the next error
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 0; 22.20 GiB total capacity; 19.72 GiB already allocated; 143.12 MiB free;
21.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I upgraded to a bigger instance type, and played with param PYTORCH_CUDA_ALLOC_CONF but the error persisted.
Let me know if you see the same error.
It worked by putting it on the AWS instance type: ml.g4dn.12xlarge and setting SM_NUM_GPUS: "4"
Yes, I got it worked with these configs. Thank you so much~