fix: switch eval to A100-80GB and track completion bonus 23c1b75 Running ianalin123 Claude Opus 4.6 commited on 2 days ago
chore: switch to A100-80GB for cheaper training runs 6d522e2 ianalin123 Claude Opus 4.6 commited on 2 days ago
feat(v3): hybrid GRPO + SFT training with expert beam search 4d27d34 ianalin123 Claude Opus 4.6 commited on 2 days ago
fix(v3): confidence-gated epsilon to avoid destroying JSON structure dd0653e ianalin123 Claude Opus 4.6 commited on 2 days ago
feat(v3): epsilon-greedy exploration to guarantee GRPO diversity 8fd9084 ianalin123 Claude Opus 4.6 commited on 2 days ago
feat(v3): add top-k sampling for forced token diversity during exploration e76ef35 ianalin123 Claude Opus 4.6 commited on 2 days ago
feat(v3): expose noise_scale and temperature as Modal CLI args 732eae3 ianalin123 Claude Opus 4.6 commited on 2 days ago
fix: unbuffered stdout for real-time Modal training logs 44a3100 ianalin123 Claude Opus 4.6 commited on 2 days ago
perf(v3): batch reference log-probs to reduce adapter toggling overhead 586c831 ianalin123 Claude Opus 4.6 commited on 2 days ago
fix(v3): prevent collapse from noise-amplified negative gradients 084c350 ianalin123 Claude Opus 4.6 commited on 2 days ago
feat(v3): exploration noise processor for GRPO output diversity faeae12 ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): prompt perturbation to break deterministic outputs 8d12671 ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): correct default lr in modal entrypoint to match train_v3.py cbf0523 ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): guarantee reward variance with num_per_task and higher temp e7988ad ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): normalize KL per token to prevent over-skipping fd09fad ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): prevent catastrophic forgetting with KL constraints bc999ca ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): improve training diagnostics and autocast deprecation 2aac7ce ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): include multi-step tasks from curriculum start 9843d5c ianalin123 Claude Opus 4.6 commited on 3 days ago
fix(v3): proper reference model handling for KL divergence 393baa9 ianalin123 Claude Opus 4.6 commited on 3 days ago
feat(v3): custom multi-step GRPO training loop with GiGPO 00e69d1 ianalin123 Claude Opus 4.6 commited on 3 days ago
feat(v3): add rollout logic and fix target_crease_edges bug b7dee84 ianalin123 Claude Opus 4.6 commited on 3 days ago
docs: add V3 sequential RL training architecture and implementation plan efbdc9d ianalin123 Claude Opus 4.6 commited on 3 days ago
feat(v3): implement core building blocks for multi-step RL training 1d79ab6 ianalin123 Claude Opus 4.6 commited on 3 days ago
fix: default port to 7860 for HuggingFace Spaces health check 6530258 ianalin123 commited on 3 days ago
feat(v2): implement multi-step environment with PaperState and per-step rewards 02d14b3 ianalin123 commited on 3 days ago
feat(v2): filter None fields from OrigamiAction in client step payload 033b6ea ianalin123 commited on 3 days ago
feat(v2): update train_grpo.py for step-level prompts and per_step_reward 5378254 ianalin123 commited on 3 days ago
feat(v2): update OrigamiAction/Observation/State for multi-step mode c65943e ianalin123 commited on 3 days ago
feat(v2): add step_reward.py — per-step Kawasaki/Maekawa/coverage reward 8408ee0 ianalin123 commited on 3 days ago
feat(v2): add max_folds to tasks + waterbomb_base + map_fold tasks bc8801e ianalin123 commited on 3 days ago
feat(v2): add extract_crease_json and valid_crease reward to training/reward.py 9e8d178 ianalin123 commited on 3 days ago
feat(v2): port PaperState to origami_server/engine/paper_state.py 2f02162 ianalin123 commited on 3 days ago
chore(v2): add shapely dependency for PaperState intersection detection c4585c2 ianalin123 commited on 3 days ago
feat(v2): port CreaseGraph to origami_server/engine/graph.py f1f22fb ianalin123 commited on 3 days ago
feat: add hackathon demo script and result charts 32d29cd ianalin123 Claude Sonnet 4.6 commited on 3 days ago
feat: add Modal eval script for GRPO checkpoints d102807 ianalin123 Claude Sonnet 4.6 commited on 3 days ago
feat: expand reward pipeline and GRPO training config 7490526 ianalin123 Claude Sonnet 4.6 commited on 3 days ago
chore: ignore pycache, outputs, and trainer state 8565386 ianalin123 Claude Sonnet 4.6 commited on 3 days ago
revert: restore train_origami.ipynb to Prasanna's exact version 4ade360 ianalin123 commited on 3 days ago
chore: restore Prasanna's notebook, add Modal launch section 113d136 ianalin123 Claude Sonnet 4.6 commited on 3 days ago
feat: Railway deployment + multi-task GRPO + Modal B200 training f88b76b ianalin123 commited on 3 days ago
Add GRPO training notebook + Dockerfile for cloud training (#1) bff0657 praveen287 sissississi commited on 3 days ago