Commit History

fix: switch eval to A100-80GB and track completion bonus
23c1b75
Running

ianalin123 Claude Opus 4.6 commited on

chore: switch to A100-80GB for cheaper training runs
6d522e2

ianalin123 Claude Opus 4.6 commited on

feat(v3): hybrid GRPO + SFT training with expert beam search
4d27d34

ianalin123 Claude Opus 4.6 commited on

fix(v3): confidence-gated epsilon to avoid destroying JSON structure
dd0653e

ianalin123 Claude Opus 4.6 commited on

feat(v3): epsilon-greedy exploration to guarantee GRPO diversity
8fd9084

ianalin123 Claude Opus 4.6 commited on

feat(v3): add top-k sampling for forced token diversity during exploration
e76ef35

ianalin123 Claude Opus 4.6 commited on

feat(v3): expose noise_scale and temperature as Modal CLI args
732eae3

ianalin123 Claude Opus 4.6 commited on

fix: unbuffered stdout for real-time Modal training logs
44a3100

ianalin123 Claude Opus 4.6 commited on

perf(v3): batch reference log-probs to reduce adapter toggling overhead
586c831

ianalin123 Claude Opus 4.6 commited on

fix(v3): prevent collapse from noise-amplified negative gradients
084c350

ianalin123 Claude Opus 4.6 commited on

feat(v3): exploration noise processor for GRPO output diversity
faeae12

ianalin123 Claude Opus 4.6 commited on

fix(v3): prompt perturbation to break deterministic outputs
8d12671

ianalin123 Claude Opus 4.6 commited on

fix(v3): correct default lr in modal entrypoint to match train_v3.py
cbf0523

ianalin123 Claude Opus 4.6 commited on

fix(v3): guarantee reward variance with num_per_task and higher temp
e7988ad

ianalin123 Claude Opus 4.6 commited on

fix(v3): normalize KL per token to prevent over-skipping
fd09fad

ianalin123 Claude Opus 4.6 commited on

fix(v3): prevent catastrophic forgetting with KL constraints
bc999ca

ianalin123 Claude Opus 4.6 commited on

fix(v3): improve training diagnostics and autocast deprecation
2aac7ce

ianalin123 Claude Opus 4.6 commited on

fix(v3): include multi-step tasks from curriculum start
9843d5c

ianalin123 Claude Opus 4.6 commited on

fix(v3): proper reference model handling for KL divergence
393baa9

ianalin123 Claude Opus 4.6 commited on

feat(v3): custom multi-step GRPO training loop with GiGPO
00e69d1

ianalin123 Claude Opus 4.6 commited on

feat(v3): add rollout logic and fix target_crease_edges bug
b7dee84

ianalin123 Claude Opus 4.6 commited on

docs: add V3 sequential RL training architecture and implementation plan
efbdc9d

ianalin123 Claude Opus 4.6 commited on

feat(v3): implement core building blocks for multi-step RL training
1d79ab6

ianalin123 Claude Opus 4.6 commited on

fix: default port to 7860 for HuggingFace Spaces health check
6530258

ianalin123 commited on

feat: V2 frontend — episode player UI
50fe1dd

ianalin123 commited on

chore: track binary assets with git-lfs
41d76ad

ianalin123 commited on

feat(v2): implement multi-step environment with PaperState and per-step rewards
02d14b3

ianalin123 commited on

feat(v2): filter None fields from OrigamiAction in client step payload
033b6ea

ianalin123 commited on

feat(v2): update train_grpo.py for step-level prompts and per_step_reward
5378254

ianalin123 commited on

feat(v2): update OrigamiAction/Observation/State for multi-step mode
c65943e

ianalin123 commited on

feat(v2): add step_reward.py — per-step Kawasaki/Maekawa/coverage reward
8408ee0

ianalin123 commited on

feat(v2): add max_folds to tasks + waterbomb_base + map_fold tasks
bc8801e

ianalin123 commited on

feat(v2): add extract_crease_json and valid_crease reward to training/reward.py
9e8d178

ianalin123 commited on

feat(v2): port PaperState to origami_server/engine/paper_state.py
2f02162

ianalin123 commited on

chore(v2): add shapely dependency for PaperState intersection detection
c4585c2

ianalin123 commited on

feat(v2): port CreaseGraph to origami_server/engine/graph.py
f1f22fb

ianalin123 commited on

docs: add V2 handoff and plan documents
3dddeb0

ianalin123 Claude Sonnet 4.6 commited on

feat: add hackathon demo script and result charts
32d29cd

ianalin123 Claude Sonnet 4.6 commited on

feat: add Modal eval script for GRPO checkpoints
d102807

ianalin123 Claude Sonnet 4.6 commited on

feat: update origami viewer UI
bb7f9f0

ianalin123 Claude Sonnet 4.6 commited on

feat: expand reward pipeline and GRPO training config
7490526

ianalin123 Claude Sonnet 4.6 commited on

chore: ignore pycache, outputs, and trainer state
8565386

ianalin123 Claude Sonnet 4.6 commited on

chore: add HF Spaces frontmatter to README
3b8d9c2

ianalin123 commited on

revert: restore train_origami.ipynb to Prasanna's exact version
4ade360

ianalin123 commited on

chore: restore Prasanna's notebook, add Modal launch section
113d136

ianalin123 Claude Sonnet 4.6 commited on

feat: Railway deployment + multi-task GRPO + Modal B200 training
f88b76b

ianalin123 commited on

Update training notebook
18da19c

praveen287 commited on

Add MAX_CONCURRENT_ENVS, sync latest changes
a9fbb49

praveen287 commited on

Update viewer
3ed51fd

praveen287 commited on

Add GRPO training notebook + Dockerfile for cloud training (#1)
bff0657

praveen287 sissississi commited on