"""Models — load the fine-tuned MiniCPM (GGUF) via llama.cpp and hold the prompts. NEXT MILESTONE. Local-first by default (llama-cpp-python loading a quantized GGUF), with an optional Modal endpoint as a hosted fallback for the public Space. Exposes a single chat/tool-call interface so the agent loop is runtime-agnostic. """