File size: 330 Bytes
c55ab5e
 
 
 
 
 
1
2
3
4
5
6
7
"""Models — load the fine-tuned MiniCPM (GGUF) via llama.cpp and hold the prompts.

NEXT MILESTONE. Local-first by default (llama-cpp-python loading a quantized GGUF),
with an optional Modal endpoint as a hosted fallback for the public Space. Exposes
a single chat/tool-call interface so the agent loop is runtime-agnostic.
"""