A newer version of the Gradio SDK is available: 6.19.0
6.19.0
Swappable local inference backends (llama_cpp default, transformers optional extra).
llama_cpp
transformers
from inference.factory import get_backend backend = get_backend() backend.load() reply = backend.chat([{"role": "user", "content": "Hello!"}])