Spaces:

msradam
/

riprap

Sleeping

App Files Files Community

riprap / app /mellea_validator.py

Commit History

fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352

51e6b76
verified

msradam commited on 19 days ago

fix(mellea): restore vLLM probe with auth header + non-5xx check

c92bd29
verified

msradam commited on 19 days ago

fix: remove vLLM probe + move warmup post-planner

9ba3950
verified

msradam commited on 19 days ago

fix: vLLM probe with auth header + accept 401 as ready

7859ba2
verified

msradam commited on 19 days ago

fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py)

4858f1e
verified

msradam commited on 19 days ago

fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout

d2b160d
verified

msradam commited on 19 days ago

fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries

d8b5a19
verified

msradam commited on 19 days ago

fix(mellea): fallback to best_paragraph when later attempts time out

384daac
verified

msradam commited on 19 days ago

fix(mellea): two-phase timeout — 250s first token, 45s inter-token

9e0c117
verified

msradam commited on 20 days ago

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU

3d8b58e
verified

msradam commited on 20 days ago

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback

57f5889
verified

msradam commited on 20 days ago

fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding

ae9cb16
verified

msradam commited on 20 days ago

fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation

745d7fb
verified

msradam commited on 20 days ago

fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs

1ddb69a
verified

msradam commited on 20 days ago

fix(mellea): per-token timeout with queue-based streaming to prevent hang

a3447d8
verified

msradam commited on 20 days ago

deploy(l4): self-contained Riprap mirror

3dbff85

seriffic commited on 21 days ago

Commit History

fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352 51e6b76 verified

fix(mellea): restore vLLM probe with auth header + non-5xx check c92bd29 verified

fix: remove vLLM probe + move warmup post-planner 9ba3950 verified

fix: vLLM probe with auth header + accept 401 as ready 7859ba2 verified

fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py) 4858f1e verified

fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout d2b160d verified

fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries d8b5a19 verified

fix(mellea): fallback to best_paragraph when later attempts time out 384daac verified

fix(mellea): two-phase timeout — 250s first token, 45s inter-token 9e0c117 verified

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU 3d8b58e verified

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback 57f5889 verified

fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding ae9cb16 verified

fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation 745d7fb verified

fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs 1ddb69a verified

fix(mellea): per-token timeout with queue-based streaming to prevent hang a3447d8 verified

deploy(l4): self-contained Riprap mirror 3dbff85

fix(vllm): reduce num_predict 512→350 to stay under max_model_len=2352

51e6b76
verified

fix(mellea): restore vLLM probe with auth header + non-5xx check

c92bd29
verified

fix: remove vLLM probe + move warmup post-planner

9ba3950
verified

fix: vLLM probe with auth header + accept 401 as ready

7859ba2
verified

fix: vLLM readiness probe + fast Ollama fallback timeout (app/mellea_validator.py)

4858f1e
verified

fix(mellea): raise first_token_timeout to 400s to match new 360s LiteLLM timeout

d2b160d
verified

fix(mellea): set _timed_out on LiteLLM exception before first token to prevent concurrent retries

d8b5a19
verified

fix(mellea): fallback to best_paragraph when later attempts time out

384daac
verified

fix(mellea): two-phase timeout — 250s first token, 45s inter-token

9e0c117
verified

fix(warmup): fire 1-token LLM ping at request start to warm RunPod GPU

3d8b58e
verified

fix(mellea): 120s cold-start timeout + drop run_reconcile fallback

57f5889
verified

fix(mellea): abort retry loop after streaming hang to prevent concurrent vLLM requests + fix closure late-binding

ae9cb16
verified

fix(mellea): increase num_predict 350→512 to avoid citations_dense truncation

745d7fb
verified

fix(mellea): reduce num_predict to 350 for vLLM context headroom; reject empty paragraphs

1ddb69a
verified

fix(mellea): per-token timeout with queue-based streaming to prevent hang

a3447d8
verified

deploy(l4): self-contained Riprap mirror

3dbff85