Confidence-gated blending for local mode: graft scales by (1-LLM_confidence), gated when LLM is confident (>0.6), scores scaled to 0.3x of LLM score range
Add inference settings for model loading: prioritize CUDA, MPS, and CPU. Update runner and pipeline to utilize new settings for dtype and device placement.