live-stream: separate <think> from the answer. Backend marks the think->answer boundary; UI labels the live text 'thinking' WHILE reasoning, then collapses it to a 'reasoned' tag and shows the answer distinctly - so the streaming think is never mistaken for the final answer.
DRY sampler on the action phase (CODEAGENT_DRY=0.8): penalizes repeated token SEQUENCES to break the 'Example: ... Example: ...' ramble, without the flat repeat_penalty that garbles code (single-token repeats like indentation stay free)
force-answer after web cap: count BOTH web_search+web_fetch toward cap (4), then inject a strong no-tools 'answer now' directive; if it still tool-calls, stop with what it has. Fixes the 'stops but never answers' case.
revert think temp to greedy (0.9 made the 1B over-create - built an HTML app for a math question); keep the web-search cap/dedup which is the real fix. CODEAGENT_THINK_TEMP env still lets you try 0.9.
stop web-search rabbit-hole: dedup identical queries (warn, don't re-run) + hard cap 4 real searches + break if it keeps searching past the cap; use OpenBMB-recommended think sampling (temp 0.9 / top_p 0.95) instead of greedy (greedy think fed the loop)
token-level streaming: complete() streams via SSE + on_token callback; run_agent pushes the in-progress generation (think then action) to the UI live; run() renders it typing out. Eval stays non-streaming.
README: clean tag block (fix best-agent typo, drop the 3 non-Field-Guide tags into a descriptive group) + add Output examples (real Modal-run outputs) (#2)
fix double-box user bubble: Gradio nests .message.user > [data-testid=user]; my selector styled BOTH -> two stacked boxes. Now the OUTER is the single bubble, the INNER is flattened (transparent/no box). Verified via live CSS injection.
fix chart blank: 1B was WRITING chart.png as TEXT (623-byte bogus png -> 0x0 render). write tool now REFUSES image-ext text-writes + steers to matplotlib savefig; _render_media_file skips files without a real image magic-number (no blank bubble).
fix REGRESSION: prompt hardening ('write then final answer is one sentence') made the 1B SKIP the bash run + hallucinate 'saved chart.png' without running -> no image. Restore: after writing a runnable script you MUST run it with bash + read output before answering; never claim you ran code you did not.
fix chart/media not displaying: the 1B savefig's to an absolute /workspace/ path inside the script (bash rewrite can't reach in-code paths), so the PNG landed outside the scanned sandbox. Now: ensure /workspace exists at startup + _extra_media scans it this-turn-only (no cross-session leak; concurrency_limit=1). Refactored shared _render_media_file.