feat: expand task suite to 22 challenges and update reward signal mechanics 6392732 Navigam commited on Apr 8
refactor: enforce strictly positive reward range [0.01, 0.99] and update documentation accordingly 02fd062 Navigam commited on Apr 8
feat: implement ReAct agent architecture with robust JSON parsing and expanded task suite cd7967c Navigam commited on Apr 7