Commit History

fix(vllm): only pass documents via chat_template_kwargs, not top-level
3034360
Running
verified

msradam commited on

fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352
51e6b76
verified

msradam commited on

debug: improve vllm-direct endpoint to test context overflow
daf3545
verified

msradam commited on

debug: add /api/debug/vllm-direct endpoint
52a5649
verified

msradam commited on

fix(vllm): strip document-role messages before sending to vLLM
80deb38
verified

msradam commited on

fix(mellea): restore vLLM probe with auth header + non-5xx check
c92bd29
verified

msradam commited on

fix: remove vLLM probe + move warmup post-planner
b604d92
verified

msradam commited on

fix: remove vLLM probe + move warmup post-planner
9ba3950
verified

msradam commited on

fix: conditional Ollama timeout — 5s when vLLM primary, 240s otherwise
f4632b7
verified

msradam commited on

fix: vLLM probe with auth header + accept 401 as ready
7859ba2
verified

msradam commited on

fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py)
4858f1e
verified

msradam commited on

fix: vLLM readiness probe + fast Ollama fallback timeout (app/llm.py)
3368898
verified

msradam commited on

fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout
d2b160d
verified

msradam commited on

fix(llm): raise vLLM first-token timeout to 360s for RunPod cold start
d79c380
verified

msradam commited on

fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries
d8b5a19
verified

msradam commited on

fix(mellea): fallback to best_paragraph when later attempts time out
384daac
verified

msradam commited on

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones
ad05fd2
verified

msradam commited on

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones
602bc83
verified

msradam commited on

fix(mellea): two-phase timeout — 250s first token, 45s inter-token
9e0c117
verified

msradam commited on

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU
3d8b58e
verified

msradam commited on

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU
1795b46
verified

msradam commited on

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback
e54a4ea
verified

msradam commited on

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback
57f5889
verified

msradam commited on

fix(sse): send keepalive comments to prevent proxy idle timeout
f9c55e4
verified

msradam commited on

fix(fsm): inject valid doc_ids into system prompt to prevent rag_npcc4 hallucination
6969759
verified

msradam commited on

fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding
ae9cb16
verified

msradam commited on

fix(reconcile): correct rag_npcc4→npcc4_slr in system prompt example citation
d86924c
verified

msradam commited on

fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation
745d7fb
verified

msradam commited on

fix(fsm): fallback uses non-streaming reconcile to avoid double hang
1c134dd
verified

msradam commited on

fix(fsm): fallback to non-strict reconcile when Mellea returns empty paragraph
0d6b029
verified

msradam commited on

fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs
1ddb69a
verified

msradam commited on

fix(mellea): per-token timeout with queue-based streaming to prevent hang
a3447d8
verified

msradam commited on

fix(cornerstone): replace ThreadPoolExecutor with sequential loop — fixes Burr post-action cleanup hang
9a5fe81
verified

msradam commited on

fix(burr): remove SQLitePersister cache — was poisoning state on first broken run
1d7f796
verified

msradam commited on

sync: upload app/reconcile.py from local main
84b70d6
verified

msradam commited on

feat(burr): LocalTrackingClient, SQLitePersister cache, StepEventHook, conditional transitions
3e1703d
verified

msradam commited on

deploy(l4): self-contained Riprap mirror
3dbff85

seriffic commited on