Update train_grpo1.py: set NUM_STEPS=50 and optimize logging steps 43be96e ayhm23 commited on 28 days ago
Enhance: Update GRPO training process with improved plotting and sentinel integration for 50-step runs 825ae74 ayhm23 commited on 28 days ago
fix: add ForceStopCallback to ensure training ends at step 300 d87e253 ayhm23 commited on 29 days ago
fix: harden Docker build with python3 -m pip and import validation f197867 ayhm23 commited on 29 days ago
fix: revert Dockerfile to latest and pin transformers to 4.46.0 230899c ayhm23 commited on 29 days ago
chore: pin HF base image to 4.47.1 for total environment parity 6118bc1 ayhm23 commited on 29 days ago
fix: upgrade transformers to 4.47.1 to satisfy TRL 0.12.1 requirements 07a34b0 ayhm23 commited on 29 days ago
fix: upgrade accelerate and transformers to satisfy TRL 0.12.1 62e4067 ayhm23 commited on 29 days ago
fix: revert to even more stable library versions and add wandb/matplotlib 001c98e ayhm23 commited on 29 days ago
fix: pin trl to 0.11.4 and transformers to 4.45.2 to resolve FSDP error b29b7db ayhm23 commited on 29 days ago
fix: remove torch/transformers from requirements to avoid Docker conflict 46f600c ayhm23 commited on 29 days ago
fix: upgrade pytorch base image to 2.5.1 for TRL compatibility 36ba6cb ayhm23 commited on 29 days ago
chore: set Dockerfile to training mode for HF Space deployment 42840cb ayhm23 commited on 29 days ago
feat: optimize GRPO training with stability guards and reward curve plotting 48ced0d ayhm23 commited on 29 days ago
fix: clean up deps for HF Docker build - server-only pyproject, split requirements 7eb8978 sanyamvermaa commited on 29 days ago
fix: add missing requirements-server.txt for Docker build 9534763 sanyamvermaa commited on 29 days ago
feat: implement generalization testing pipeline and report generation for held-out evaluation scenarios ae22694 ayhm23 commited on 29 days ago
docs: refactor context.md for project clarity and add baseline/phase 3 documentation 6a045c8 ayhm23 commited on 29 days ago
Refactor: Optimized for HF deployment, improved reward stability, and cleaned up workspace. 4169567 ayhm23 commited on 29 days ago
Fix 0.52 plateau, implement multi-turn eval, and prepare HF deployment 2ad21f7 ayhm23 commited on 29 days ago