How to use from
vLLMInstall from pip and serve model
# Install vLLM from pip:
pip install vllm# Start the vLLM server:
vllm serve "GetSoloTech/Llama3.2-1B-Med-Transcript-Notes"# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "GetSoloTech/Llama3.2-1B-Med-Transcript-Notes",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'Use Docker
docker model run hf.co/GetSoloTech/Llama3.2-1B-Med-Transcript-NotesQuick Links
# Gated model: Login with a HF token with gated access permission hf auth login