Tool Calling with Open Source Models & VLLM

joshuaeric 's Collections

updated 2 days ago

Production-tested configurations, prompt templates, and lessons learned for reliable tool calling with open source LLMs on VLLM. Includes Hermes-3, Ll

Upvote

joshuaeric/vllm-tool-calling-guide

Text Generation • Updated 2 days ago

Note The guide — production VLLM configs, prompt templates, Python examples, and 7 prompt engineering lessons
NousResearch/Hermes-3-Llama-3.1-70B-FP8

71B • Updated Sep 8, 2024 • 738 • 25

Note Best tool calling quality. Purpose-built for function calling. Parser: hermes
nvidia/Llama-3.3-70B-Instruct-FP8

71B • Updated Aug 22, 2025 • 111k • 16

Note Best Open WebUI compatibility. Native FP8 on Blackwell. Parser: llama3_json
casperhansen/llama-3.3-70b-instruct-awq

Text Generation • 71B • Updated Dec 6, 2024 • 576k • 38

Note AWQ alternative when FP8 isn't available or supported
RedHatAI/Qwen2-72B-Instruct-FP8

Text Generation • 73B • Updated Jul 18, 2024 • 944 • 15

Note Best multilingual tool calling. Strong reasoning. Parser: hermes
RedHatAI/Mistral-Nemo-Instruct-2407-FP8

Text Generation • 12B • Updated Jul 19, 2024 • 9.99k • 18

Note Fastest at 100-150 tok/s. Only 15GB VRAM. Great for dev/testing. Parser: mistral

Upvote