Spaces:

msradam
/

riprap

Running

App Files Files Community

riprap

Commit History

fix(vllm): only pass documents via chat_template_kwargs, not top-level

3034360

Running
verified

msradam commited on 1 day ago

fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352

51e6b76
verified

msradam commited on 1 day ago

debug: improve vllm-direct endpoint to test context overflow

daf3545
verified

msradam commited on 1 day ago

debug: add /api/debug/vllm-direct endpoint

52a5649
verified

msradam commited on 1 day ago

fix(vllm): strip document-role messages before sending to vLLM

80deb38
verified

msradam commited on 1 day ago

fix(mellea): restore vLLM probe with auth header + non-5xx check

c92bd29
verified

msradam commited on 1 day ago

fix: remove vLLM probe + move warmup post-planner

b604d92
verified

msradam commited on 1 day ago

fix: remove vLLM probe + move warmup post-planner

9ba3950
verified

msradam commited on 1 day ago

fix: conditional Ollama timeout — 5s when vLLM primary, 240s otherwise

f4632b7
verified

msradam commited on 1 day ago

fix: vLLM probe with auth header + accept 401 as ready

7859ba2
verified

msradam commited on 1 day ago

fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py)

4858f1e
verified

msradam commited on 1 day ago

fix: vLLM readiness probe + fast Ollama fallback timeout (app/llm.py)

3368898
verified

msradam commited on 1 day ago

fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout

d2b160d
verified

msradam commited on 1 day ago

fix(llm): raise vLLM first-token timeout to 360s for RunPod cold start

d79c380
verified

msradam commited on 1 day ago

fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries

d8b5a19
verified

msradam commited on 1 day ago

fix(mellea): fallback to best_paragraph when later attempts time out

384daac
verified

msradam commited on 1 day ago

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones

ad05fd2
verified

msradam commited on 1 day ago

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones

602bc83
verified

msradam commited on 1 day ago

fix(mellea): two-phase timeout — 250s first token, 45s inter-token

9e0c117
verified

msradam commited on 1 day ago

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU

3d8b58e
verified

msradam commited on 1 day ago

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU

1795b46
verified

msradam commited on 1 day ago

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback

e54a4ea
verified

msradam commited on 1 day ago

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback

57f5889
verified

msradam commited on 1 day ago

fix(sse): send keepalive comments to prevent proxy idle timeout

f9c55e4
verified

msradam commited on 1 day ago

fix(fsm): inject valid doc_ids into system prompt to prevent rag_npcc4 hallucination

6969759
verified

msradam commited on 1 day ago

fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding

ae9cb16
verified

msradam commited on 1 day ago

fix(reconcile): correct rag_npcc4→npcc4_slr in system prompt example citation

d86924c
verified

msradam commited on 1 day ago

fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation

745d7fb
verified

msradam commited on 1 day ago

fix(fsm): fallback uses non-streaming reconcile to avoid double hang

1c134dd
verified

msradam commited on 1 day ago

fix(fsm): fallback to non-strict reconcile when Mellea returns empty paragraph

0d6b029
verified

msradam commited on 1 day ago

fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs

1ddb69a
verified

msradam commited on 1 day ago

fix(mellea): per-token timeout with queue-based streaming to prevent hang

a3447d8
verified

msradam commited on 1 day ago

fix(cornerstone): replace ThreadPoolExecutor with sequential loop — fixes Burr post-action cleanup hang

9a5fe81
verified

msradam commited on 1 day ago

fix(burr): remove SQLitePersister cache — was poisoning state on first broken run

1d7f796
verified

msradam commited on 1 day ago

sync: upload app/reconcile.py from local main

84b70d6
verified

msradam commited on 2 days ago

feat(burr): LocalTrackingClient, SQLitePersister cache, StepEventHook, conditional transitions

3e1703d
verified

msradam commited on 2 days ago

deploy(l4): self-contained Riprap mirror

3dbff85

seriffic commited on 3 days ago

Commit History

fix(vllm): only pass documents via chat_template_kwargs, not top-level 3034360 Running verified

fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352 51e6b76 verified

debug: improve vllm-direct endpoint to test context overflow daf3545 verified

debug: add /api/debug/vllm-direct endpoint 52a5649 verified

fix(vllm): strip document-role messages before sending to vLLM 80deb38 verified

fix(mellea): restore vLLM probe with auth header + non-5xx check c92bd29 verified

fix: remove vLLM probe + move warmup post-planner b604d92 verified

fix: remove vLLM probe + move warmup post-planner 9ba3950 verified

fix: conditional Ollama timeout — 5s when vLLM primary, 240s otherwise f4632b7 verified

fix: vLLM probe with auth header + accept 401 as ready 7859ba2 verified

fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py) 4858f1e verified

fix: vLLM readiness probe + fast Ollama fallback timeout (app/llm.py) 3368898 verified

fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout d2b160d verified

fix(llm): raise vLLM first-token timeout to 360s for RunPod cold start d79c380 verified

fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries d8b5a19 verified

fix(mellea): fallback to best_paragraph when later attempts time out 384daac verified

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones ad05fd2 verified

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones 602bc83 verified

fix(mellea): two-phase timeout — 250s first token, 45s inter-token 9e0c117 verified

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU 3d8b58e verified

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU 1795b46 verified

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback e54a4ea verified

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback 57f5889 verified

fix(sse): send keepalive comments to prevent proxy idle timeout f9c55e4 verified

fix(fsm): inject valid doc_ids into system prompt to prevent rag_npcc4 hallucination 6969759 verified

fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding ae9cb16 verified

fix(reconcile): correct rag_npcc4→npcc4_slr in system prompt example citation d86924c verified

fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation 745d7fb verified

fix(fsm): fallback uses non-streaming reconcile to avoid double hang 1c134dd verified

fix(fsm): fallback to non-strict reconcile when Mellea returns empty paragraph 0d6b029 verified

fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs 1ddb69a verified

fix(mellea): per-token timeout with queue-based streaming to prevent hang a3447d8 verified

fix(cornerstone): replace ThreadPoolExecutor with sequential loop — fixes Burr post-action cleanup hang 9a5fe81 verified

fix(burr): remove SQLitePersister cache — was poisoning state on first broken run 1d7f796 verified

sync: upload app/reconcile.py from local main 84b70d6 verified

feat(burr): LocalTrackingClient, SQLitePersister cache, StepEventHook, conditional transitions 3e1703d verified

deploy(l4): self-contained Riprap mirror 3dbff85

fix(vllm): only pass documents via chat_template_kwargs, not top-level

3034360

Running
verified

fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352

51e6b76
verified

debug: improve vllm-direct endpoint to test context overflow

daf3545
verified

debug: add /api/debug/vllm-direct endpoint

52a5649
verified

fix(vllm): strip document-role messages before sending to vLLM

80deb38
verified

fix(mellea): restore vLLM probe with auth header + non-5xx check

c92bd29
verified

fix: remove vLLM probe + move warmup post-planner

b604d92
verified

fix: remove vLLM probe + move warmup post-planner

9ba3950
verified

fix: conditional Ollama timeout — 5s when vLLM primary, 240s otherwise

f4632b7
verified

fix: vLLM probe with auth header + accept 401 as ready

7859ba2
verified

fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py)

4858f1e
verified

fix: vLLM readiness probe + fast Ollama fallback timeout (app/llm.py)

3368898
verified

fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout

d2b160d
verified

fix(llm): raise vLLM first-token timeout to 360s for RunPod cold start

d79c380
verified

fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries

d8b5a19
verified

fix(mellea): fallback to best_paragraph when later attempts time out

384daac
verified

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones

ad05fd2
verified

fix(warmup): fire vLLM warmup before planner so RunPod loads during planner+stones

602bc83
verified

fix(mellea): two-phase timeout — 250s first token, 45s inter-token

9e0c117
verified

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU

3d8b58e
verified

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU

1795b46
verified

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback

e54a4ea
verified

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback

57f5889
verified

fix(sse): send keepalive comments to prevent proxy idle timeout

f9c55e4
verified

fix(fsm): inject valid doc_ids into system prompt to prevent rag_npcc4 hallucination

6969759
verified

fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding

ae9cb16
verified

fix(reconcile): correct rag_npcc4→npcc4_slr in system prompt example citation

d86924c
verified

fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation

745d7fb
verified

fix(fsm): fallback uses non-streaming reconcile to avoid double hang

1c134dd
verified

fix(fsm): fallback to non-strict reconcile when Mellea returns empty paragraph

0d6b029
verified

fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs

1ddb69a
verified

fix(mellea): per-token timeout with queue-based streaming to prevent hang

a3447d8
verified

fix(cornerstone): replace ThreadPoolExecutor with sequential loop — fixes Burr post-action cleanup hang

9a5fe81
verified

fix(burr): remove SQLitePersister cache — was poisoning state on first broken run

1d7f796
verified

sync: upload app/reconcile.py from local main

84b70d6
verified

feat(burr): LocalTrackingClient, SQLitePersister cache, StepEventHook, conditional transitions

3e1703d
verified

deploy(l4): self-contained Riprap mirror

3dbff85