joshuaeric/vllm-tool-calling-guide
Text Generation • Updated
Production-tested configurations, prompt templates, and lessons learned for reliable tool calling with open source LLMs on VLLM. Includes Hermes-3, Ll
Note The guide — production VLLM configs, prompt templates, Python examples, and 7 prompt engineering lessons
Note Best tool calling quality. Purpose-built for function calling. Parser: hermes
Note Best Open WebUI compatibility. Native FP8 on Blackwell. Parser: llama3_json
Note AWQ alternative when FP8 isn't available or supported
Note Best multilingual tool calling. Strong reasoning. Parser: hermes
Note Fastest at 100-150 tok/s. Only 15GB VRAM. Great for dev/testing. Parser: mistral