Speed up CPU inference: halve token limits, pre-download models, fix OMP threads 4af4003 cgoodmaker Claude Opus 4.6 commited on about 15 hours ago
Use bfloat16 on CPU to halve memory (8GB vs 16GB float32) 0989643 cgoodmaker Claude Opus 4.6 commited on 5 days ago
Fix MCP subprocess deadlock: use stderr=None instead of PIPE da343a7 cgoodmaker Claude Opus 4.6 commited on 8 days ago
Add timeout and stderr logging to MCP subprocess to debug tool hangs c376e14 cgoodmaker Claude Opus 4.6 commited on 8 days ago
Force MCP tool models to CPU to avoid GPU VRAM contention with MedGemma 1a97904 cgoodmaker Claude Opus 4.6 commited on 8 days ago
Add RAG Phase 4 management guidance, rebuild guidelines index (286 chunks), post-analysis hint UI 5241b71 cgoodmaker Claude Opus 4.6 commited on 8 days ago
Use dtype instead of deprecated torch_dtype in model_kwargs 82f82ac cgoodmaker Claude Opus 4.6 commited on 8 days ago
Redesign chat UI and fix MedGemma generation config issues 58a4476 cgoodmaker Claude Opus 4.6 commited on 8 days ago
Pass HF_TOKEN explicitly to pipeline() for gated model auth b08f876 cgoodmaker commited on 10 days ago
Use HF_TOKEN env var to authenticate for gated MedGemma model bb7e939 cgoodmaker commited on 10 days ago