feat(grpo): enhance action response parsing by removing reasoning blocks and refining regex handling 5202cdc Bemohit commited on 30 days ago
feat(grpo): update max completion length and refine prompt handling for improved evaluation ff1f7a0 Bemohit commited on 30 days ago
feat(grpo): adjust training parameters and disable thinking mode for consistent action calls 8f1e9fc Bemohit commited on 30 days ago
feat(grpo): enhance training dynamics with new replay policies and update state steps 264ee3d Bemohit commited on 30 days ago
feat(grpo): update max sequence length and refine prompt formatting in training scripts 79bced7 Bemohit commited on about 1 month ago
refactor: remove SakhaEnvWrapper class and streamline reward function in GRPO training script 097c9e4 Bemohit commited on about 1 month ago
feat(rubric): integrate SakhaRubric into SakhaEnvironment step/reward path 1bdc498 unverified atharva-again commited on about 1 month ago
feat(rubric): add composable rubric scaffolding wrapping existing reward logic 237c898 unverified atharva-again commited on about 1 month ago
fix(inference): align checklist compliance and structured run logs 284ec94 unverified atharva-again commited on Apr 7
feat(formatters): add structured output formatting system 5dd1b3a unverified atharva-again commited on Apr 7
feat(models): added new action types and pending task system 3ab55b0 unverified atharva-again commited on Apr 5
fix: heuristic takeover when llm fails, step reward/penalty derived from grader, some other fixes 52b4770 unverified atharva-again commited on Apr 4
fix: copy README to Docker, enable web UI, and enhance documentation 30220ed unverified atharva-again commited on Apr 2