Instructions to use Salesforce/codegen-16B-mono with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Salesforce/codegen-16B-mono with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Salesforce/codegen-16B-mono")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-16B-mono") model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-16B-mono") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Salesforce/codegen-16B-mono with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Salesforce/codegen-16B-mono" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/codegen-16B-mono", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Salesforce/codegen-16B-mono
- SGLang
How to use Salesforce/codegen-16B-mono with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Salesforce/codegen-16B-mono" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/codegen-16B-mono", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Salesforce/codegen-16B-mono" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/codegen-16B-mono", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Salesforce/codegen-16B-mono with Docker Model Runner:
docker model run hf.co/Salesforce/codegen-16B-mono
Not working in inference api. Goes in timeout after 120 sec.
import sys
import requests
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
YOUR_API_KEY = user_secrets.get_secret("YOUR_API_KEY")
if YOUR_API_KEY == "":
sys.exit("API key not found in secrets.")
API_URL = "https://api-inference.huggingface.co/models/Salesforce/codegen-16B-mono"
headers = {"Authorization": f"Bearer {YOUR_API_KEY}"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
prompt="""def download_file(url, directory):
"""
This function downloads a file from a URL and saves it to a user-specified directory using the filename from the end of the URL.
Args:
url (str): The URL of the file to be downloaded.
directory (str): The directory where the file should be saved. If the directory does not exist, it will be created.
Raises:
ValueError: If the URL is not valid.
Returns:
None
""""""
pre_prompt="""Q:\n\nComplete the code of the following function:\n\n"""
post_prompt="\n\nA:\n\n"
output = query({
"inputs": pre_prompt+prompt+post_prompt,
"parameters": {"temperature": 0.1,
"repetition_penalty": 1.1,
"max_new_tokens":250,
"max_time":120,
"return_full_text":False,
"num_return_sequences":1,
"do_sample":True,
},
"options": {"use_cache":False,
"wait_for_model":True,
},
})
if type(output) == list:
generated_text = output[0]['generated_text']
else:
sys.exit(output['error'])
stop_seq='\n\n\n'
stop_idx = generated_text.find(stop_seq)
if stop_idx != -1:
generated_text=generated_text[:stop_idx].strip()
else:
generated_text=generated_text.strip()
print(post_prompt+generated_text)
Your code is a bit unformatted but there might be an error when you define prompt: After """def download_file(url, directory): you have an additional """ in the next line which closes the string. Thus the next lines are Python interpreted.
Other than that, I also have a time out: I specified "options": {"wait_for_model": True} in the API request and after some time the function returns but the response.json()[0]['generated_text'] has the following output:
'Error:M o d e l S a l e s f o r c e / c o d e g e n - 1 6 B - m o n o t i m e o u t'
I suppose the model is too large for the inference API, see https://discuss.huggingface.co/t/cannot-run-large-models-using-api-token/31844/2