Spaces:

build-small-hackathon
/

retro

Running

App Files Files Community

retro / agents.py

Commit History

Fix AI insight leaking thinking content + make insights on-demand instead of auto-generated

9cbb438

sankalphs commited on 3 days ago

Fix: detect and strip Nemotron reasoning narration (model talking to itself) from AI output

63016d2

sankalphs commited on 3 days ago

Fix AI output: aggressively strip thinking tags, markdown, field prefixes from all LLM responses

2e1ad6a

sankalphs commited on 3 days ago

UX overhaul + Gradio migration + remove llama.cpp

e6970ed

sankalphs commited on 3 days ago

fix: show CLOUD GPU status instead of offline, strip orphan </think> tags, increase Modal timeout to 180s

4976ef6

sankalphs commited on 3 days ago

Remove llama-cpp-python and lighten Dockerfile; Modal handles inference

e4749c9

sankalphs commited on 3 days ago

Add Modal GPU inference support for faster LLM responses

ad0ab13

sankalphs commited on 3 days ago

fix: restore llama.cpp with source build, use fine-tuned GGUF model

7e810ce

sankalphs commited on 3 days ago

fix: switch to microsoft/Phi-3-mini-4k-instruct (Gemma 2B down, LoRA not on serverless API)

1c45b4e

sankalphs commited on 3 days ago

feat: add HF Inference API with LoRA model, deterministic fallback

724f227

sankalphs commited on 3 days ago

refactor: remove llama-cpp entirely, use deterministic mock-only mode

a7789ad

sankalphs commited on 3 days ago

fix: load LLM in background thread so Space stays healthy during 2.84 GB cold start

e5d102f

sankalphs commited on 4 days ago

fix: correct LLM model path default, fix chat 'error: format only' leak, surface load errors

1c2dd4b

sankalphs commited on 4 days ago

feat: browser-local engine, Zerodha dashboard, historical events, chatbot, per-user isolation

f316f5a

sankalphs commited on 4 days ago

fix: mentor returns real roast from numeric summary when LLM output is empty/malformed

d3fb801

sankalphs commited on 4 days ago

fix: harden LLM calls, cast numpy to native floats, pin llama-cpp-python wheel

11853b1

sankalphs commited on 4 days ago

fix: accept percent value (0-100) for trades, add MOCK_LLM flag, full Playwright E2E

55da5c9

sankalphs commited on 4 days ago

Phase 2/3: Gradio Server backend, CRT frontend, engine, agents, mentor, tests, CI/CD

1d0b04b

sankalphs commited on 4 days ago

Commit History

Fix AI insight leaking thinking content + make insights on-demand instead of auto-generated 9cbb438

Fix: detect and strip Nemotron reasoning narration (model talking to itself) from AI output 63016d2

Fix AI output: aggressively strip thinking tags, markdown, field prefixes from all LLM responses 2e1ad6a

UX overhaul + Gradio migration + remove llama.cpp e6970ed

fix: show CLOUD GPU status instead of offline, strip orphan </think> tags, increase Modal timeout to 180s 4976ef6

Remove llama-cpp-python and lighten Dockerfile; Modal handles inference e4749c9

Add Modal GPU inference support for faster LLM responses ad0ab13

fix: restore llama.cpp with source build, use fine-tuned GGUF model 7e810ce

fix: switch to microsoft/Phi-3-mini-4k-instruct (Gemma 2B down, LoRA not on serverless API) 1c45b4e

feat: add HF Inference API with LoRA model, deterministic fallback 724f227

refactor: remove llama-cpp entirely, use deterministic mock-only mode a7789ad

fix: load LLM in background thread so Space stays healthy during 2.84 GB cold start e5d102f

fix: correct LLM model path default, fix chat 'error: format only' leak, surface load errors 1c2dd4b

feat: browser-local engine, Zerodha dashboard, historical events, chatbot, per-user isolation f316f5a

fix: mentor returns real roast from numeric summary when LLM output is empty/malformed d3fb801

fix: harden LLM calls, cast numpy to native floats, pin llama-cpp-python wheel 11853b1

fix: accept percent value (0-100) for trades, add MOCK_LLM flag, full Playwright E2E 55da5c9

Phase 2/3: Gradio Server backend, CRT frontend, engine, agents, mentor, tests, CI/CD 1d0b04b

Fix AI insight leaking thinking content + make insights on-demand instead of auto-generated

9cbb438

Fix: detect and strip Nemotron reasoning narration (model talking to itself) from AI output

63016d2

Fix AI output: aggressively strip thinking tags, markdown, field prefixes from all LLM responses

2e1ad6a

UX overhaul + Gradio migration + remove llama.cpp

e6970ed

fix: show CLOUD GPU status instead of offline, strip orphan </think> tags, increase Modal timeout to 180s

4976ef6

Remove llama-cpp-python and lighten Dockerfile; Modal handles inference

e4749c9

Add Modal GPU inference support for faster LLM responses

ad0ab13

fix: restore llama.cpp with source build, use fine-tuned GGUF model

7e810ce

fix: switch to microsoft/Phi-3-mini-4k-instruct (Gemma 2B down, LoRA not on serverless API)

1c45b4e

feat: add HF Inference API with LoRA model, deterministic fallback

724f227

refactor: remove llama-cpp entirely, use deterministic mock-only mode

a7789ad

fix: load LLM in background thread so Space stays healthy during 2.84 GB cold start

e5d102f

fix: correct LLM model path default, fix chat 'error: format only' leak, surface load errors

1c2dd4b

feat: browser-local engine, Zerodha dashboard, historical events, chatbot, per-user isolation

f316f5a

fix: mentor returns real roast from numeric summary when LLM output is empty/malformed

d3fb801

fix: harden LLM calls, cast numpy to native floats, pin llama-cpp-python wheel

11853b1

fix: accept percent value (0-100) for trades, add MOCK_LLM flag, full Playwright E2E

55da5c9

Phase 2/3: Gradio Server backend, CRT frontend, engine, agents, mentor, tests, CI/CD

1d0b04b