fix(vllm): only pass documents via chat_template_kwargs, not top-level 3034360 Running verified msradam commited on 1 day ago
fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352 51e6b76 verified msradam commited on 1 day ago
debug: improve vllm-direct endpoint to test context overflow daf3545 verified msradam commited on 1 day ago
fix(vllm): strip document-role messages before sending to vLLM 80deb38 verified msradam commited on 1 day ago
fix(mellea): restore vLLM probe with auth header + non-5xx check c92bd29 verified msradam commited on 1 day ago
fix: conditional Ollama timeout — 5s when vLLM primary, 240s otherwise f4632b7 verified msradam commited on 1 day ago
fix: vLLM probe with auth header + accept 401 as ready 7859ba2 verified msradam commited on 1 day ago
fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py) 4858f1e verified msradam commited on 1 day ago
fix: vLLM readiness probe + fast Ollama fallback timeout (app/llm.py) 3368898 verified msradam commited on 1 day ago
fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout d2b160d verified msradam commited on 1 day ago
fix(llm): raise vLLM first-token timeout to 360s for RunPod cold start d79c380 verified msradam commited on 1 day ago
fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries d8b5a19 verified msradam commited on 1 day ago
fix(mellea): fallback to best_paragraph when later attempts time out 384daac verified msradam commited on 1 day ago
fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones ad05fd2 verified msradam commited on 1 day ago
fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones 602bc83 verified msradam commited on 1 day ago
fix(mellea): two-phase timeout — 250s first token, 45s inter-token 9e0c117 verified msradam commited on 1 day ago
fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU 3d8b58e verified msradam commited on 1 day ago
fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU 1795b46 verified msradam commited on 1 day ago
fix(mellea): 120s cold-start timeout + drop run_reconcile fallback e54a4ea verified msradam commited on 1 day ago
fix(mellea): 120s cold-start timeout + drop run_reconcile fallback 57f5889 verified msradam commited on 1 day ago
fix(sse): send keepalive comments to prevent proxy idle timeout f9c55e4 verified msradam commited on 1 day ago
fix(fsm): inject valid doc_ids into system prompt to prevent rag_npcc4 hallucination 6969759 verified msradam commited on 1 day ago
fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding ae9cb16 verified msradam commited on 1 day ago
fix(reconcile): correct rag_npcc4→npcc4_slr in system prompt example citation d86924c verified msradam commited on 1 day ago
fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation 745d7fb verified msradam commited on 1 day ago
fix(fsm): fallback uses non-streaming reconcile to avoid double hang 1c134dd verified msradam commited on 1 day ago
fix(fsm): fallback to non-strict reconcile when Mellea returns empty paragraph 0d6b029 verified msradam commited on 1 day ago
fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs 1ddb69a verified msradam commited on 1 day ago
fix(mellea): per-token timeout with queue-based streaming to prevent hang a3447d8 verified msradam commited on 1 day ago
fix(cornerstone): replace ThreadPoolExecutor with sequential loop — fixes Burr post-action cleanup hang 9a5fe81 verified msradam commited on 1 day ago
fix(burr): remove SQLitePersister cache — was poisoning state on first broken run 1d7f796 verified msradam commited on 1 day ago
feat(burr): LocalTrackingClient, SQLitePersister cache, StepEventHook, conditional transitions 3e1703d verified msradam commited on 2 days ago