ianalin123 commited on
Commit
2936d2e
Β·
1 Parent(s): fc71686

docs/handoff

Browse files
Files changed (1) hide show
  1. docs/optigami_handoff.md +767 -0
docs/optigami_handoff.md ADDED
@@ -0,0 +1,767 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OrigamiRL β€” OpenEnv Hackathon Handoff Document
2
+
3
+ ## TL;DR
4
+
5
+ Build the **first multi-turn RL environment where an LLM learns to generate origami folding instructions**, verified by a computational origami simulator. Target the OpenEnv Hackathon (March 7-8, 2026, SF β€” $100K+ in prizes). Use OpenEnv spec + Unsloth GRPO for training. Dense verifiable rewards from origami geometry theorems (Kawasaki, Maekawa). No learned reward model needed.
6
+
7
+ ---
8
+
9
+ ## Hackathon Context
10
+
11
+ - **Event:** OpenEnv Hackathon SF, hosted by Cerebral Valley + Shack15 + Meta/PyTorch
12
+ - **Date:** March 7-8, 2026 (happening NOW)
13
+ - **Prize:** $100K+ cash
14
+ - **Teams:** Up to 4 people
15
+ - **Format:** Build RL environments, post-train a base model
16
+
17
+ ### Judging Criteria
18
+
19
+ | Category | Weight | What Matters |
20
+ |----------|--------|-------------|
21
+ | Environment Innovation | 40% | Novel, creative, challenging. Does it meaningfully test agent behavior? |
22
+ | Storytelling | 30% | Clear problem explanation, engaging demo, easy to follow |
23
+ | Training Script Showing Improvement | 20% | Observable reward curves, before/after behavior |
24
+ | Reward and Training Pipeline Setup | 10% | Coherent reward logic, meaningful improvement in inference |
25
+
26
+ ### Key Sponsors to Impress
27
+
28
+ - **Meta/PyTorch** β€” OpenEnv creators, want environments using their spec
29
+ - **Unsloth AI** β€” GRPO training infra, ART (Agent Reinforcement Trainer). USE THEIR TOOLS.
30
+ - **OpenPipe** β€” ART trainer (frontend/backend split for GRPO). Also use.
31
+ - **Patronus AI** β€” Building "generative simulators" (auto-scaling RL environments). They care about curriculum difficulty scaling and verifiable rewards.
32
+ - **Snorkel AI** β€” "2026 is the year of environments." They care about data quality and environment diversity.
33
+ - **Hugging Face** β€” OpenEnv Hub, want environments deployed there
34
+ - **Scale AI / Mercor** β€” Agent evaluation, structured task environments
35
+
36
+ ---
37
+
38
+ ## The Pitch (for judges)
39
+
40
+ > "Spatial reasoning is the next frontier for LLM training β€” NeurIPS 2025 papers like OrigamiSpace showed that even GPT-5 fails at multi-step origami reasoning. But those are benchmarks, not training environments. We built OrigamiRL: the first multi-turn RL environment where an LLM agent learns to fold paper by outputting instructions, receiving geometric feedback, and improving through GRPO. Our reward function is fully verifiable β€” fold validity is checked against computational origami axioms, not an LLM judge. We built it on OpenEnv + Unsloth with a natural curriculum from single folds to full cranes."
41
+
42
+ ---
43
+
44
+ ## Prior Work (What Exists, Where the Gaps Are)
45
+
46
+ ### 1. OrigamiSpace (NeurIPS 2025 Spotlight)
47
+
48
+ - **Paper:** https://arxiv.org/abs/2511.18450
49
+ - **What it is:** Benchmark with 350 origami data instances (CP diagrams, folding processes, folded shapes). 4 evaluation tasks: Pattern Prediction, Multi-step Spatial Reasoning, Spatial Relationship Prediction, End-to-End CP Code Generation.
50
+ - **Their compiler:** Outputs detailed flattened diagrams with crease locations and stacking relationships, supports interactive simulation with MLLMs, provides comprehensive error feedback. Checks: syntax validity, geometric foldability, no self-intersections, Kawasaki's theorem, Maekawa's theorem.
51
+ - **Their reward metrics for code gen:** Hausdorff distance (shape similarity), dihedral angle distribution, bounding box aspect ratios, constraint satisfaction.
52
+ - **Difficulty levels:** Easy (3-9 steps), Medium (10-19 steps), Hard (20-30 steps)
53
+ - **Gap:** Single-turn only (LLM generates complete CP code in one shot). They mention RL exploration but it's not the focus. No multi-turn sequential folding.
54
+
55
+ ### 2. GamiBench (Dec 2025)
56
+
57
+ - **Paper:** https://arxiv.org/abs/2512.22207
58
+ - **What it is:** 186 regular + 186 impossible 2D crease patterns with 3D folded shapes from 6 viewpoints. 3 VQA tasks.
59
+ - **Gap:** Evaluation-only, no training. Tests single-step spatial understanding.
60
+
61
+ ### 3. SpatialThinker (NeurIPS 2025)
62
+
63
+ - **Paper:** https://arxiv.org/abs/2511.07403
64
+ - **What it is:** 3D-aware MLLM trained with RL using dense spatial rewards. Constructs scene graphs. Multi-objective reward with lexicographic gating.
65
+ - **Key architecture to steal:** Dense reward design with lexicographic ordering β€” format β†’ count β†’ accuracy β†’ spatial. Nearly doubled RL training gains vs sparse rewards. Only needed 7K training samples with GRPO.
66
+ - **Gap:** Static scene understanding (objects on a table), not sequential physical transformations.
67
+
68
+ ### 4. rigid-origami Gym (IJCAI 2023)
69
+
70
+ - **Repo:** https://github.com/belalugaX/rigid-origami
71
+ - **Paper:** "Automating Rigid Origami Design" (https://arxiv.org/abs/2211.13219)
72
+ - **What it is:** Gym environment where agent constructs crease pattern graphs on a board. Sparse rewards. Foldability validated by triangle intersection tests + kinematic rigidity model. Game terminates on non-foldable states.
73
+ - **Gap:** Classical RL agents (discrete grid actions), NOT LLMs generating text. Rigid-origami tessellations only, not traditional origami. No natural language.
74
+
75
+ ### 5. The Unique Gap We Fill
76
+
77
+ Nobody has built a model that reasons about **sequential 2D-to-3D geometric transformations with physical constraints** through **natural language instructions** in a **multi-turn RL training loop**. Origami is uniquely hard because it requires tracking how a flat sheet's topology changes through a sequence of folds β€” mental rotation, spatial visualization, and perspective-taking all at once.
78
+
79
+ ---
80
+
81
+ ## Environment Design
82
+
83
+ ### Architecture Overview
84
+
85
+ ```
86
+ +---------------------------------------------------+
87
+ | OpenEnv Server |
88
+ | +-----------+ +----------+ +--------------+ |
89
+ | | State | | Action | | Reward | |
90
+ | | (FOLD JSON| | (LLM | | (Dense, | |
91
+ | | + target)| | output) | | verifiable) | |
92
+ | +-----------+ +----------+ +--------------+ |
93
+ | | | | |
94
+ | v v v |
95
+ | +-----------------------------------------------+|
96
+ | | Paper Geometry Engine (Python) ||
97
+ | | - Polygon state (Shapely) ||
98
+ | | - Fold operations (reflection across line) ||
99
+ | | - Kawasaki/Maekawa constraint checks ||
100
+ | | - Layer tracking ||
101
+ | | - FOLD format import/export ||
102
+ | +-----------------------------------------------+|
103
+ | | |
104
+ | v |
105
+ | +-----------------------------------------------+|
106
+ | | Three.js Visualizer (Demo only) ||
107
+ | | - 3D fold animation ||
108
+ | | - Strain heatmap ||
109
+ | | - Instruction stream ||
110
+ | +-----------------------------------------------+|
111
+ +---------------------------------------------------+
112
+ | ^
113
+ v |
114
+ +---------------------------------------------------+
115
+ | Unsloth ART / GRPO Trainer |
116
+ | - Qwen2.5-VL-7B or Qwen3-4B base model |
117
+ | - LoRA/QLoRA for efficient training |
118
+ | - Multi-turn rollouts |
119
+ +---------------------------------------------------+
120
+ ```
121
+
122
+ ### OpenEnv Spec Compliance
123
+
124
+ Must implement these APIs:
125
+
126
+ ```python
127
+ class OrigamiEnv:
128
+ async def reset() -> Observation # New episode: flat paper + target
129
+ async def step(action) -> (Observation, reward, done, info)
130
+ async def state() -> State # Current paper geometry
131
+ async def close() # Cleanup
132
+ ```
133
+
134
+ OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
135
+ Install: `pip install -e .` then `openenv init origami_env`
136
+
137
+ ### State Space
138
+
139
+ ```python
140
+ @dataclass
141
+ class OrigamiState:
142
+ # Current paper geometry
143
+ vertices: List[Tuple[float, float]] # 2D vertex positions
144
+ edges: List[Tuple[int, int]] # Edge connectivity
145
+ edges_assignment: List[str] # 'M', 'V', 'B', 'F' (mountain/valley/boundary/flat)
146
+ edges_foldAngle: List[float] # -180 to 180 degrees
147
+ faces: List[List[int]] # Face vertex indices
148
+ layer_order: List[List[int]] # Face stacking order
149
+
150
+ # Episode context
151
+ target_crease_pattern: dict # Target FOLD JSON
152
+ target_shape_image: Optional[np.ndarray] # Target folded shape (for multimodal)
153
+ instruction_history: List[str] # Previous instructions
154
+ step_count: int
155
+ max_steps: int
156
+ ```
157
+
158
+ This maps directly to the **FOLD format** (JSON-based, used by all origami software):
159
+
160
+ ```json
161
+ {
162
+ "vertices_coords": [[0,0], [1,0], [1,1], [0,1]],
163
+ "edges_vertices": [[0,1], [1,2], [2,3], [3,0]],
164
+ "edges_assignment": ["B", "B", "B", "B"],
165
+ "edges_foldAngle": [0, 0, 0, 0],
166
+ "faces_vertices": [[0, 1, 2, 3]]
167
+ }
168
+ ```
169
+
170
+ FOLD spec: https://github.com/edemaine/fold
171
+ FOLD JS library: https://edemaine.github.io/fold/
172
+
173
+ ### Action Space
174
+
175
+ The LLM outputs a JSON action:
176
+
177
+ ```json
178
+ {
179
+ "instruction": "Fold the top edge down to meet the bottom edge",
180
+ "fold_line": [[0, 0.5], [1, 0.5]],
181
+ "fold_angle": -180,
182
+ "assignment": "V"
183
+ }
184
+ ```
185
+
186
+ The `instruction` field is natural language (what we're training the model to produce well). The geometric fields are the verifiable representation. During training, the model outputs both; for the final demo, the NL instruction is the star.
187
+
188
+ Alternative simpler action (for early iterations):
189
+
190
+ ```json
191
+ {
192
+ "instruction": "Valley fold along the horizontal center line",
193
+ "fold_type": "valley",
194
+ "fold_axis": "horizontal",
195
+ "fold_position": 0.5
196
+ }
197
+ ```
198
+
199
+ ### Reward Function β€” Dense, Multi-Objective, Lexicographically Gated
200
+
201
+ Inspired by SpatialThinker's design. Rewards are computed in order; later rewards only apply if earlier gates pass.
202
+
203
+ ```python
204
+ def compute_reward(state, action, new_state, target) -> dict:
205
+ rewards = {}
206
+
207
+ # LEVEL 1: Format (gate for everything else)
208
+ # Does the output parse into a valid fold operation?
209
+ rewards['format'] = 1.0 if parseable(action) else 0.0
210
+ if rewards['format'] == 0:
211
+ return rewards # Stop here
212
+
213
+ # LEVEL 2: Local Geometric Validity
214
+ # Kawasaki's theorem: sector angles at each interior vertex sum to 2pi
215
+ kawasaki_valid = check_kawasaki(new_state)
216
+ # Maekawa's theorem: |M - V| = 2 at each interior vertex
217
+ maekawa_valid = check_maekawa(new_state)
218
+ # No self-intersection
219
+ no_intersection = check_no_self_intersection(new_state)
220
+ rewards['validity'] = (kawasaki_valid + maekawa_valid + no_intersection) / 3.0
221
+ if rewards['validity'] < 0.5:
222
+ return rewards # Stop here
223
+
224
+ # LEVEL 3: Physical Feasibility
225
+ # Can this fold actually be performed given layer stack?
226
+ layer_consistent = check_layer_ordering(new_state)
227
+ fold_achievable = check_fold_angle_feasible(new_state)
228
+ rewards['feasibility'] = (layer_consistent + fold_achievable) / 2.0
229
+
230
+ # LEVEL 4: Progress Toward Target (Dense)
231
+ # Crease pattern graph similarity
232
+ cp_similarity = crease_pattern_similarity(new_state, target)
233
+ # Fold angle distribution match
234
+ angle_similarity = fold_angle_distribution_match(new_state, target)
235
+ # Bounding box aspect ratio match
236
+ bbox_similarity = bounding_box_similarity(new_state, target)
237
+ rewards['progress'] = 0.4 * cp_similarity + 0.4 * angle_similarity + 0.2 * bbox_similarity
238
+
239
+ # LEVEL 5: Completion Bonus
240
+ if shape_matches_target(new_state, target, tolerance=0.05):
241
+ rewards['completion'] = 10.0
242
+
243
+ # LEVEL 6: Efficiency
244
+ rewards['efficiency'] = -0.01 # Small step penalty to encourage fewer folds
245
+
246
+ # Total
247
+ rewards['total'] = (
248
+ 0.1 * rewards['format'] +
249
+ 0.2 * rewards['validity'] +
250
+ 0.1 * rewards['feasibility'] +
251
+ 0.5 * rewards['progress'] +
252
+ rewards.get('completion', 0) +
253
+ rewards['efficiency']
254
+ )
255
+ return rewards
256
+ ```
257
+
258
+ ### Key Origami Theorems for Verification
259
+
260
+ These are the verifiable constraints β€” the "unit tests" of origami:
261
+
262
+ 1. **Kawasaki's Theorem:** At any interior vertex of a flat-foldable crease pattern, the alternating sum of sector angles equals zero (equivalently, they sum to 2pi on each side). NECESSARY condition for flat-foldability.
263
+
264
+ 2. **Maekawa's Theorem:** At any interior vertex, the number of mountain folds minus valley folds equals +/-2. |M - V| = 2.
265
+
266
+ 3. **No self-intersection:** Faces cannot penetrate each other during folding.
267
+
268
+ 4. **Euler's formula for planar graphs:** V - E + F = 2 (sanity check on graph structure).
269
+
270
+ 5. **Huzita-Hatori axioms:** The 7 axioms defining all possible single-fold operations (point-to-point, point-to-line, line-to-line, etc.). These define the VALID action space.
271
+
272
+ ### Curriculum Design
273
+
274
+ | Level | Folds | Examples | Complexity |
275
+ |-------|-------|----------|-----------|
276
+ | 1 | 1 | Valley fold in half, mountain fold corner | Single fold validity |
277
+ | 2 | 2-3 | Paper airplane nose, triangle fold | Sequential dependency |
278
+ | 3 | 4-6 | Simple boat, fortune teller | Multi-step with symmetry |
279
+ | 4 | 7-12 | Paper airplane (full), jumping frog | Longer horizon planning |
280
+ | 5 | 13-20 | Crane, lily | Complex spatial tracking |
281
+
282
+ For the hackathon, focus on Levels 1-3. Even showing reward improvement on Level 1-2 is a strong result.
283
+
284
+ ---
285
+
286
+ ## Core Implementation: Python Geometry Engine
287
+
288
+ This is the MOST IMPORTANT piece. Pure Python, no JS dependencies.
289
+
290
+ ```python
291
+ import numpy as np
292
+ from shapely.geometry import Polygon, LineString, MultiPolygon
293
+ from shapely.ops import split
294
+ from typing import List, Tuple, Dict
295
+ import json
296
+
297
+ class PaperState:
298
+ """Represents the current state of the origami paper."""
299
+
300
+ def __init__(self, size: float = 1.0):
301
+ # Start with a unit square
302
+ self.regions = [Polygon([(0,0), (size,0), (size,size), (0,size)])]
303
+ self.fold_history = []
304
+ self.crease_lines = []
305
+ self.crease_assignments = [] # 'M' or 'V'
306
+ self.crease_angles = []
307
+ self.layer_order = [0] # Stack order of regions
308
+
309
+ def apply_fold(self, fold_line: LineString, angle: float, assignment: str) -> dict:
310
+ """
311
+ Apply a fold operation. Returns dict with validity info.
312
+ fold_line: Shapely LineString defining the fold axis
313
+ angle: fold angle in degrees (-180 to 180)
314
+ assignment: 'M' (mountain) or 'V' (valley)
315
+ """
316
+ result = {'valid': True, 'errors': []}
317
+
318
+ # 1. Split regions by fold line
319
+ new_regions = []
320
+ for region in self.regions:
321
+ if fold_line.intersects(region):
322
+ parts = split(region, fold_line)
323
+ new_regions.extend(parts.geoms)
324
+ else:
325
+ new_regions.append(region)
326
+
327
+ # 2. Determine which side folds (based on assignment)
328
+ folding_side = []
329
+ staying_side = []
330
+ for region in new_regions:
331
+ centroid = region.centroid
332
+ side = self._point_side(centroid, fold_line)
333
+ if side > 0:
334
+ folding_side.append(region)
335
+ else:
336
+ staying_side.append(region)
337
+
338
+ # 3. Reflect folding regions across fold line
339
+ reflected = [self._reflect_polygon(r, fold_line) for r in folding_side]
340
+
341
+ # 4. Update state
342
+ self.regions = staying_side + reflected
343
+ self.crease_lines.append(fold_line)
344
+ self.crease_assignments.append(assignment)
345
+ self.crease_angles.append(angle)
346
+ self.fold_history.append({
347
+ 'line': list(fold_line.coords),
348
+ 'angle': angle,
349
+ 'assignment': assignment
350
+ })
351
+
352
+ # 5. Update layer order
353
+ self._update_layer_order(staying_side, reflected)
354
+
355
+ return result
356
+
357
+ def _reflect_polygon(self, poly: Polygon, line: LineString) -> Polygon:
358
+ """Reflect a polygon across a line."""
359
+ coords = list(poly.exterior.coords)
360
+ reflected_coords = [self._reflect_point(p, line) for p in coords]
361
+ return Polygon(reflected_coords)
362
+
363
+ def _reflect_point(self, point: tuple, line: LineString) -> tuple:
364
+ """Reflect a point across a line."""
365
+ p = np.array(point[:2])
366
+ l1 = np.array(line.coords[0])
367
+ l2 = np.array(line.coords[1])
368
+ d = l2 - l1
369
+ d = d / np.linalg.norm(d)
370
+ # Reflection formula: p' = p - 2(p-l1).n * n where n is normal to line
371
+ n = np.array([-d[1], d[0]])
372
+ v = p - l1
373
+ return tuple(p - 2 * np.dot(v, n) * n)
374
+
375
+ def _point_side(self, point, line: LineString) -> float:
376
+ """Returns positive if point is on left side of line, negative if right."""
377
+ p = np.array([point.x, point.y])
378
+ l1 = np.array(line.coords[0])
379
+ l2 = np.array(line.coords[1])
380
+ return float(np.cross(l2 - l1, p - l1))
381
+
382
+ def _update_layer_order(self, staying, reflected):
383
+ """Update the layer stacking order after a fold."""
384
+ self.layer_order = list(range(len(staying))) + \
385
+ list(range(len(staying), len(staying) + len(reflected)))
386
+
387
+ def to_fold_json(self) -> dict:
388
+ """Export current state as FOLD format JSON."""
389
+ vertices = set()
390
+ for line in self.crease_lines:
391
+ for coord in line.coords:
392
+ vertices.add(tuple(round(c, 10) for c in coord))
393
+ # Add boundary vertices
394
+ for region in self.regions:
395
+ for coord in region.exterior.coords:
396
+ vertices.add(tuple(round(c, 10) for c in coord[:2]))
397
+
398
+ vertices = sorted(list(vertices))
399
+ vertex_map = {v: i for i, v in enumerate(vertices)}
400
+
401
+ edge_set = set()
402
+ edges_list = []
403
+ assignments_list = []
404
+ angles_list = []
405
+
406
+ # Add crease edges
407
+ for i, line in enumerate(self.crease_lines):
408
+ c = [tuple(round(x, 10) for x in coord) for coord in line.coords]
409
+ edge = tuple(sorted([vertex_map[c[0]], vertex_map[c[1]]]))
410
+ if edge not in edge_set:
411
+ edge_set.add(edge)
412
+ edges_list.append(list(edge))
413
+ assignments_list.append(self.crease_assignments[i])
414
+ angles_list.append(self.crease_angles[i])
415
+
416
+ return {
417
+ 'vertices_coords': [list(v) for v in vertices],
418
+ 'edges_vertices': edges_list,
419
+ 'edges_assignment': assignments_list,
420
+ 'edges_foldAngle': angles_list,
421
+ }
422
+
423
+
424
+ class OrigamiVerifier:
425
+ """Verifiable reward functions based on origami theorems."""
426
+
427
+ @staticmethod
428
+ def check_kawasaki(state: PaperState) -> bool:
429
+ """Kawasaki's theorem: alternating sum of angles at each interior vertex = 0."""
430
+ fold_json = state.to_fold_json()
431
+ vertices = fold_json['vertices_coords']
432
+ edges = fold_json['edges_vertices']
433
+
434
+ for v_idx in range(len(vertices)):
435
+ v = vertices[v_idx]
436
+ incident_edges = [e for e in edges if v_idx in e]
437
+ if len(incident_edges) < 4:
438
+ continue # Need degree-4+ for Kawasaki
439
+
440
+ # Calculate sector angles
441
+ angles = []
442
+ for e in incident_edges:
443
+ other = e[1] if e[0] == v_idx else e[0]
444
+ other_v = vertices[other]
445
+ angle = np.arctan2(other_v[1] - v[1], other_v[0] - v[0])
446
+ angles.append(angle)
447
+
448
+ angles.sort()
449
+ sector_angles = []
450
+ for i in range(len(angles) - 1):
451
+ sector_angles.append(angles[i+1] - angles[i])
452
+ sector_angles.append(2*np.pi - (angles[-1] - angles[0]))
453
+
454
+ # Kawasaki: alternating sum should be ~0
455
+ if len(sector_angles) >= 4:
456
+ alt_sum = sum(sector_angles[::2]) - sum(sector_angles[1::2])
457
+ if abs(alt_sum) > 0.01:
458
+ return False
459
+ return True
460
+
461
+ @staticmethod
462
+ def check_maekawa(state: PaperState) -> bool:
463
+ """Maekawa's theorem: |M - V| = 2 at each interior vertex."""
464
+ fold_json = state.to_fold_json()
465
+ vertices = fold_json['vertices_coords']
466
+ edges = fold_json['edges_vertices']
467
+ assignments = fold_json['edges_assignment']
468
+
469
+ for v_idx in range(len(vertices)):
470
+ incident = [(i, e) for i, e in enumerate(edges) if v_idx in e]
471
+ m_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'M')
472
+ v_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'V')
473
+
474
+ if m_count + v_count >= 4: # Interior vertex with folds
475
+ if abs(m_count - v_count) != 2:
476
+ return False
477
+ return True
478
+
479
+ @staticmethod
480
+ def crease_pattern_similarity(state: PaperState, target_fold_json: dict) -> float:
481
+ """Compare current crease pattern to target. Returns 0-1 similarity."""
482
+ current = state.to_fold_json()
483
+
484
+ n_current = len(current.get('edges_vertices', []))
485
+ n_target = len(target_fold_json.get('edges_vertices', []))
486
+
487
+ if n_target == 0:
488
+ return 1.0 if n_current == 0 else 0.0
489
+
490
+ edge_count_sim = 1.0 - abs(n_current - n_target) / max(n_target, 1)
491
+ edge_count_sim = max(0, edge_count_sim)
492
+
493
+ current_assignments = current.get('edges_assignment', [])
494
+ target_assignments = target_fold_json.get('edges_assignment', [])
495
+
496
+ c_m = current_assignments.count('M')
497
+ c_v = current_assignments.count('V')
498
+ t_m = target_assignments.count('M')
499
+ t_v = target_assignments.count('V')
500
+
501
+ total = max(t_m + t_v, 1)
502
+ assign_sim = 1.0 - (abs(c_m - t_m) + abs(c_v - t_v)) / (2 * total)
503
+ assign_sim = max(0, assign_sim)
504
+
505
+ return 0.5 * edge_count_sim + 0.5 * assign_sim
506
+ ```
507
+
508
+ ---
509
+
510
+ ## OpenEnv Environment Wrapper
511
+
512
+ ```python
513
+ # origami_env/server.py
514
+ from openenv.core import Environment
515
+ from paper_engine import PaperState, OrigamiVerifier
516
+ from shapely.geometry import LineString
517
+ import json
518
+
519
+ class OrigamiEnvironment(Environment):
520
+
521
+ def __init__(self, targets_dir="targets/", max_steps=20):
522
+ self.targets_dir = targets_dir
523
+ self.max_steps = max_steps
524
+ self.paper = None
525
+ self.target = None
526
+ self.step_count = 0
527
+
528
+ async def reset(self, target_id=None):
529
+ self.paper = PaperState(size=1.0)
530
+ self.target = self._load_target(target_id)
531
+ self.step_count = 0
532
+ return self._get_observation()
533
+
534
+ async def step(self, action):
535
+ self.step_count += 1
536
+
537
+ # Parse action
538
+ try:
539
+ fold_line = LineString(action['fold_line'])
540
+ angle = action['fold_angle']
541
+ assignment = action['assignment']
542
+ except (KeyError, Exception):
543
+ reward = {'format': 0, 'total': -0.1}
544
+ return self._get_observation(), reward, False, {'error': 'parse_failed'}
545
+
546
+ # Apply fold
547
+ result = self.paper.apply_fold(fold_line, angle, assignment)
548
+
549
+ # Compute rewards
550
+ reward = self._compute_reward(result)
551
+
552
+ # Check termination
553
+ done = (
554
+ self.step_count >= self.max_steps or
555
+ reward.get('completion', 0) > 0
556
+ )
557
+
558
+ return self._get_observation(), reward, done, {}
559
+
560
+ async def state(self):
561
+ return {
562
+ 'paper': self.paper.to_fold_json(),
563
+ 'target': self.target,
564
+ 'step': self.step_count,
565
+ 'fold_history': self.paper.fold_history
566
+ }
567
+
568
+ def _compute_reward(self, fold_result):
569
+ rewards = {}
570
+ rewards['format'] = 1.0
571
+
572
+ kawasaki = OrigamiVerifier.check_kawasaki(self.paper)
573
+ maekawa = OrigamiVerifier.check_maekawa(self.paper)
574
+ rewards['validity'] = (float(kawasaki) + float(maekawa)) / 2.0
575
+
576
+ rewards['progress'] = OrigamiVerifier.crease_pattern_similarity(
577
+ self.paper, self.target
578
+ )
579
+
580
+ if rewards['progress'] > 0.95:
581
+ rewards['completion'] = 10.0
582
+
583
+ rewards['efficiency'] = -0.01
584
+
585
+ rewards['total'] = (
586
+ 0.1 * rewards['format'] +
587
+ 0.2 * rewards['validity'] +
588
+ 0.6 * rewards['progress'] +
589
+ rewards.get('completion', 0) +
590
+ rewards['efficiency']
591
+ )
592
+ return rewards
593
+
594
+ def _get_observation(self):
595
+ return {
596
+ 'paper_state': self.paper.to_fold_json(),
597
+ 'target': self.target,
598
+ 'step': self.step_count,
599
+ 'instruction_history': [str(f['line']) for f in self.paper.fold_history]
600
+ }
601
+
602
+ def _load_target(self, target_id):
603
+ if target_id:
604
+ with open(f"{self.targets_dir}/{target_id}.fold") as f:
605
+ return json.load(f)
606
+ # Default: simple valley fold in half
607
+ return {
608
+ 'vertices_coords': [[0,0], [1,0], [1,1], [0,1], [0,0.5], [1,0.5]],
609
+ 'edges_vertices': [[0,1], [1,2], [2,3], [3,0], [4,5]],
610
+ 'edges_assignment': ['B', 'B', 'B', 'B', 'V'],
611
+ 'edges_foldAngle': [0, 0, 0, 0, -180],
612
+ }
613
+ ```
614
+
615
+ ---
616
+
617
+ ## Training Script (Unsloth GRPO)
618
+
619
+ ```python
620
+ # train.py
621
+ from unsloth import FastLanguageModel
622
+ from trl import GRPOConfig, GRPOTrainer
623
+ import torch
624
+
625
+ # Load model
626
+ model, tokenizer = FastLanguageModel.from_pretrained(
627
+ model_name="unsloth/Qwen2.5-7B-Instruct",
628
+ max_seq_length=4096,
629
+ load_in_4bit=True,
630
+ )
631
+
632
+ # Add LoRA
633
+ model = FastLanguageModel.get_peft_model(
634
+ model,
635
+ r=32,
636
+ target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
637
+ "gate_proj", "up_proj", "down_proj"],
638
+ lora_alpha=32,
639
+ lora_dropout=0,
640
+ use_gradient_checkpointing="unsloth",
641
+ )
642
+
643
+ # Reward function
644
+ def origami_reward(completions, prompts):
645
+ """Compute rewards for a batch of completions."""
646
+ rewards = []
647
+ for completion in completions:
648
+ try:
649
+ action = parse_fold_action(completion)
650
+ paper = PaperState()
651
+ result = paper.apply_fold(action['fold_line'], action['angle'], action['assignment'])
652
+ r = compute_reward(paper, target)
653
+ rewards.append(r['total'])
654
+ except Exception:
655
+ rewards.append(-0.1)
656
+ return rewards
657
+
658
+ # GRPO Config
659
+ config = GRPOConfig(
660
+ output_dir="origami-grpo",
661
+ num_train_epochs=3,
662
+ per_device_train_batch_size=4,
663
+ gradient_accumulation_steps=4,
664
+ learning_rate=5e-6,
665
+ max_completion_length=512,
666
+ num_generations=8,
667
+ temperature=1.0,
668
+ logging_steps=1,
669
+ )
670
+
671
+ dataset = load_origami_prompts()
672
+
673
+ trainer = GRPOTrainer(
674
+ model=model,
675
+ config=config,
676
+ train_dataset=dataset,
677
+ reward_funcs=[origami_reward],
678
+ tokenizer=tokenizer,
679
+ )
680
+
681
+ trainer.train()
682
+ ```
683
+
684
+ ---
685
+
686
+ ## Visualization (Demo Only β€” Not in Training Loop)
687
+
688
+ ### Options
689
+
690
+ 1. **Origami Simulator** β€” https://github.com/amandaghassaei/OrigamiSimulator β€” Three.js, accepts FOLD files, shows folding animation with strain visualization
691
+ 2. **PackCAD** β€” https://packcad.com/ β€” Web-based, SVG crease patterns, rigid folding simulation
692
+ 3. **Custom Three.js** β€” Simpler but more control
693
+
694
+ ### Demo UI Layout
695
+
696
+ ```
697
+ +----------------------+----------------------+
698
+ | Instruction Stream | 3D Fold Viewer |
699
+ | | |
700
+ | Step 1: Valley fold | [Three.js canvas] |
701
+ | along center [OK] | |
702
+ | | Paper animating |
703
+ | Step 2: Fold top | fold by fold |
704
+ | corners to center | |
705
+ | | |
706
+ +----------------------+----------------------+
707
+ | Reward Dashboard |
708
+ | Format: ========== 1.0 |
709
+ | Validity: ========.. 0.8 |
710
+ | Progress: ======.... 0.6 |
711
+ | Total: =======... 0.72 |
712
+ | |
713
+ | [Reward curve over training steps] |
714
+ +----------------------------------------------+
715
+ ```
716
+
717
+ ---
718
+
719
+ ## Key Libraries and Resources
720
+
721
+ | Tool | Purpose | Link |
722
+ |------|---------|------|
723
+ | OpenEnv | Environment framework | https://github.com/meta-pytorch/OpenEnv |
724
+ | Unsloth | GRPO training | https://github.com/unslothai/unsloth |
725
+ | OpenPipe ART | Multi-turn RL trainer | https://github.com/OpenPipe/ART |
726
+ | FOLD format | Origami data structure | https://github.com/edemaine/fold |
727
+ | Rabbit Ear | JS origami library | https://github.com/rabbit-ear/rabbit-ear |
728
+ | Origami Simulator | 3D visualization | https://github.com/amandaghassaei/OrigamiSimulator |
729
+ | PackCAD | Folding simulation | https://packcad.com/ |
730
+ | Shapely | Python geometry | pip install shapely |
731
+ | rigid-origami gym | Reference gym env | https://github.com/belalugaX/rigid-origami |
732
+
733
+ ### Papers to Cite
734
+
735
+ - OrigamiSpace: https://arxiv.org/abs/2511.18450
736
+ - GamiBench: https://arxiv.org/abs/2512.22207
737
+ - SpatialThinker: https://arxiv.org/abs/2511.07403
738
+ - Automating Rigid Origami Design: https://arxiv.org/abs/2211.13219
739
+ - FOLD format spec: https://github.com/edemaine/fold/blob/main/doc/spec.md
740
+
741
+ ---
742
+
743
+ ## Priority Build Order
744
+
745
+ 1. **Python geometry engine** β€” PaperState class with fold operations and FOLD export
746
+ 2. **Verifier functions** β€” Kawasaki, Maekawa, similarity metrics
747
+ 3. **OpenEnv wrapper** β€” step/reset/state API
748
+ 4. **Simple targets** β€” Hand-create 5-10 Level 1-2 targets as .fold files
749
+ 5. **Training script** β€” Wire up Unsloth GRPO with reward function
750
+ 6. **Run training** β€” Even on small model, get reward curves
751
+ 7. **Three.js visualizer** β€” For demo only, not in training loop
752
+ 8. **Before/after demo** β€” Show base model vs trained model outputs
753
+ 9. **Polish presentation narrative**
754
+
755
+ ---
756
+
757
+ ## Narrative for Judges
758
+
759
+ **The story arc:**
760
+
761
+ 1. "LLMs are great at text but terrible at spatial reasoning"
762
+ 2. "Origami is the perfect testbed β€” it's sequential, physical, and verifiable"
763
+ 3. "NeurIPS 2025 showed even GPT-5 fails at origami benchmarks, but nobody built a TRAINING environment"
764
+ 4. "We built OrigamiRL β€” the first multi-turn RL environment for origami instruction generation"
765
+ 5. "Our rewards come from math theorems, not vibes β€” Kawasaki's theorem is our unit test"
766
+ 6. "Watch the model go from generating paper-tearing nonsense to valid fold sequences"
767
+ 7. "This generalizes to any domain where LLMs need to output structured physical instructions"