perf: reduce per-query latency — shared model, single encode, float32 LLM d624b44 hodfa840 commited on Apr 4