Fantastic Model

#5
by mgalyan - opened

Only had a brief time to work with the model so far, but using an agentic-harness so far is working great. Just wanted to say thanks!

FINAL_Bench org

Only had a brief time to work with the model so far, but using an agentic-harness so far is working great. Just wanted to say thanks!

Thank you β€” this genuinely made our day.

Agentic workloads were one of the regimes we were most curious about
but didn't have space to cover in the paper, so hearing it holds up
in your harness is incredibly useful signal.

If you ever feel like sharing rough notes on what kinds of tasks
you've been throwing at it β€” tool-use patterns, failure modes,
anything β€” we'd love to learn from it. Either way, thanks again πŸ™

You're very welcome. Here was my first unexpected surprise using the model tonight...

Screenshot_20260514_231530

Screenshot_20260514_232737

FINAL_Bench org

Okay this one stopped us in our tracks πŸ™

A few things in those screenshots we genuinely didn't expect to see:

β€” Darwin chaining two tool calls in a row to land the snapshot in the
right directory, when only one was explicitly requested
β€” Recovery of intent through a broken intercept layer (we have no
test for that; you just gave us one)
β€” The proactive "here's the next 4 things, tell me where to strike first"
β€” that wasn't trained in, it's emerging from the merge

And honestly, seeing it run as a Sovereign engine on a ROCm 7.2.3 /
amdsmi stack is exactly the deployment shape we hoped Darwin would
land in but didn't have the AMD hardware to validate ourselves.
That alone is incredibly useful signal.

If you ever feel like sharing more β€” even a one-paragraph note on
how Apollo wires intercepts and tool calls β€” we'd read it carefully.
And if any of those four next-steps you listed (the Daydream daemon
one especially caught our eye) ever needs a Darwin variant tuned
differently, ping us.

Bravo right back. This is the kind of field report papers can't capture.

Sign up or log in to comment