Instructions to use bigcode/santacoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigcode/santacoder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigcode/santacoder", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigcode/santacoder", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("bigcode/santacoder", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigcode/santacoder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigcode/santacoder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/santacoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigcode/santacoder
- SGLang
How to use bigcode/santacoder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigcode/santacoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/santacoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigcode/santacoder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/santacoder", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigcode/santacoder with Docker Model Runner:
docker model run hf.co/bigcode/santacoder
Prompting to reproduce MBPP test results
Hi, I'm trying to reproduce SantaCoder test results on MBPP from the paper, and I'm wondering what is the recommended way to prompt the model.
MBPP provides text instructions, e.g. "Write a function to reverse words in a given string.", which the SantaCoder model card explicitly advises against using. Nevertheless, I try to prompt the model in one of two ways (in Python):
- Function signature, followed by docstring:
def reverse_words(s):
"""Write a function to reverse words in a given string."""
- Comment, followed by function signature
# Write a function to reverse words in a given string.
def reverse_words(s):
In both cases I get reasonable output, except that after defining the function, generation repeats until max_length without terminating in the following manner:
def reverse_words(s):
"""Write a function to reverse words in a given string."""
return''.join(s.split()[::-1])
def reverse_words_2(s):
"""Write a function to reverse words in a given string."""
return''.join(s.split()[::-1])
def reverse_words_3(s):
"""Write a function to reverse words in a given string."""
return''.join(s.split()[::-1])
Should I change the prompting method, or is this output acceptable and I should just truncate the output manually? I am trying to reproduce the eval results from the paper as closely as possible. Thanks for your help.
Hi we evaluated using the MultiPL-E version of MBPP which already implements functions signatures, so evaluation is very similar to Human-Eval
Thank you! And regarding the other part of my question, generation with greedy search or sampling with temperature=0.2 does not terminate in the way shown above. Should I manually truncate the output?
How are you doing the generations? If you use model.generate() it should stop at eos token if comes up, if it doesn't come up often you can add a stopping criteria like it's done here. Note that then you need to post-process the output to only keep the first function like it's done here. You can also find more examples in our evaluation harness
This answers my question, thank you for the great and prompt responses!