feat(grpo): enhance action response parsing by removing reasoning blocks and refining regex handling 5202cdc Bemohit commited on 29 days ago
feat(grpo): update max completion length and refine prompt handling for improved evaluation ff1f7a0 Bemohit commited on 29 days ago
feat(grpo): adjust training parameters and disable thinking mode for consistent action calls 8f1e9fc Bemohit commited on 29 days ago
feat(grpo): enhance training dynamics with new replay policies and update state steps 264ee3d Bemohit commited on 29 days ago
feat(grpo): update max sequence length and refine prompt formatting in training scripts 79bced7 Bemohit commited on 30 days ago
refactor: remove SakhaEnvWrapper class and streamline reward function in GRPO training script 097c9e4 Bemohit commited on 30 days ago
refactor: simplify GRPO training script by removing CLI presets and syncing with notebook implementation 5f139c7 Bemohit commited on 30 days ago
feat(eval): consolidate eval_harness into eval_policies, add LLM policy support fb58610 unverified atharva-again commited on about 1 month ago
refactor(eval): extract shared eval constants and policies into eval_common.py 51d5ddb unverified atharva-again commited on about 1 month ago
feat(plots): add plot generation script and training evidence plots fcdb8dd unverified atharva-again commited on about 1 month ago
feat(train): add CLI args (learning-rate, batch-size) and full-shift defaults f96fd88 unverified atharva-again commited on about 1 month ago
feat(eval): add reproducible eval harness with comprehensive metrics fd5e667 unverified atharva-again commited on about 1 month ago
data(fixtures): capture pre-migration golden reward fixtures for parity testing 827cfe7 unverified atharva-again commited on about 1 month ago
chore: pre-merge cleanup for colab-training branch 509d302 unverified atharva-again commited on Apr 22
feat: add GRPO training script with mode presets, eval split, checkpointing fc4da82 unverified atharva-again commited on Apr 21
fix(eval_policies): ensure output directory exists before writing JSON d337c5c unverified atharva-again commited on Apr 7