"""Models — load the fine-tuned MiniCPM (GGUF) via llama.cpp and hold the prompts.

NEXT MILESTONE. Local-first by default (llama-cpp-python loading a quantized GGUF),
with an optional Modal endpoint as a hosted fallback for the public Space. Exposes
a single chat/tool-call interface so the agent loop is runtime-agnostic.
"""