How to use from
SGLangUse Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "srini98/mistral-function-calling" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "srini98/mistral-function-calling",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'Quick Links
The model was finetuned using the glaive dataset using qlora and full finetuning using FSDP.
Dataset link : Link here
For training , inference and evaluation kindly check this repository:
https://github.com/Srini-98/Function-Calling-Using-Mistral
Use the following prompt format
SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
"name": "function_name",
"description": "description",
"parameters": {
"type": "object",
"properties": {
"param_name1": {
"type": "string",
"description": "description of param"
},
"param_name2": {
"type": "string",
"description": "description of param"
},
"param_name3":{
"type: "string",
"description" : "description of param"
}
},
"required": [
"param_name1",
]
}
}
USER: {question here}
ASSISTANT: {model answer} <|endoftext|>
Example:
SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
"name": "calculate_tax",
"description": "Calculate the tax amount",
"parameters": {
"type": "object",
"properties": {
"income": {
"type": "number",
"description": "The income amount"
}
},
"required": [
"income"
]
}
}
USER: Hi, I need to calculate my tax for this year. My income is $70,000.
ASSISTANT: <functioncall> {"name": "calculate_tax", "arguments": '{"income": 70000}'} <|endoftext|>
FUNCTION RESPONSE: {"tax_amount": 17500}
ASSISTANT: Based on your income, your tax for this year is $17,500. <|endoftext|>
The answer generation can be stopped with the <|endoftext|> token. You can add multiple functions as well and set param names. "Required" field forces model to always call that param.
- Downloads last month
- 14
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "srini98/mistral-function-calling" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srini98/mistral-function-calling", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'