feat: replace Ollama with Groq API (llama-3.3-70b-versatile) befb434 therandomuser03 commited on Mar 14
perf: reduce num_ctx 8192→2048 for faster CPU inference on t3.large-HF 5bcc538 therandomuser03 commited on Mar 14