Remove training notebook for ChargebackOps Merchant Agent, consolidating code and documentation for outcome-based RL on Qwen2.5-3B. 73ff1b0 mitudrudutta commited on Apr 26
fix(eval): sequential per-checkpoint base load + product-grade docs bb2cdb9 mitudrudutta commited on Apr 25
perf(training): v5 - shorter SFT, longer GRPO, hard/nightmare oversample d66354e mitudrudutta commited on Apr 25
fix(eval): pass offload_folder to from_pretrained on T4 OOM path c86ad46 mitudrudutta commited on Apr 25
fix(grpo): revert num_generations to 8 (TRL gen_batch divisibility) 2dce087 mitudrudutta commited on Apr 25
fix(eval): preload bases, swap adapters, fix peft offload_dir crash 8fcb6cf mitudrudutta commited on Apr 24
fix(grpo): unblock learning - widen sampling + raise lr + keep dropout e26c2ac mitudrudutta commited on Apr 24
fix(notebook): pin accelerate==1.0.1 to keep huggingface-hub at 0.26.x 5674079 mitudrudutta commited on Apr 23
feat(training): outcome-based RLVR reward + clean Colab T4 notebook 1f49d52 mitudrudutta commited on Apr 23
Implement code changes to enhance functionality and improve performance 30a9e6e mitudrudutta commited on Apr 23
feat: Implement wait_for_updates action for handling delayed cases and evidence 2dedffd mitudrudutta commited on Apr 23
feat: enhance completion parsing to handle truncated JSON and `<think>` blocks 71f1fe0 mitudrudutta commited on Apr 20
feat: add per-family evaluation and plotting for training curves a79d430 mitudrudutta commited on Apr 20
feat: Add training curve evaluation and plotting utilities with unit tests 8fe3b35 pauldebanshu19 commited on Apr 19
Add training notebook and benchmark runner for ChargebackOps bd00c06 pauldebanshu19 commited on Apr 19