Spaces:
Running
Running
| # OrigamiRL β OpenEnv Hackathon Handoff Document | |
| ## TL;DR | |
| Build the **first multi-turn RL environment where an LLM learns to generate origami folding instructions**, verified by a computational origami simulator. Target the OpenEnv Hackathon (March 7-8, 2026, SF β $100K+ in prizes). Use OpenEnv spec + Unsloth GRPO for training. Dense verifiable rewards from origami geometry theorems (Kawasaki, Maekawa). No learned reward model needed. | |
| --- | |
| ## Hackathon Context | |
| - **Event:** OpenEnv Hackathon SF, hosted by Cerebral Valley + Shack15 + Meta/PyTorch | |
| - **Date:** March 7-8, 2026 (happening NOW) | |
| - **Prize:** $100K+ cash | |
| - **Teams:** Up to 4 people | |
| - **Format:** Build RL environments, post-train a base model | |
| ### Judging Criteria | |
| | Category | Weight | What Matters | | |
| |----------|--------|-------------| | |
| | Environment Innovation | 40% | Novel, creative, challenging. Does it meaningfully test agent behavior? | | |
| | Storytelling | 30% | Clear problem explanation, engaging demo, easy to follow | | |
| | Training Script Showing Improvement | 20% | Observable reward curves, before/after behavior | | |
| | Reward and Training Pipeline Setup | 10% | Coherent reward logic, meaningful improvement in inference | | |
| ### Key Sponsors to Impress | |
| - **Meta/PyTorch** β OpenEnv creators, want environments using their spec | |
| - **Unsloth AI** β GRPO training infra, ART (Agent Reinforcement Trainer). USE THEIR TOOLS. | |
| - **OpenPipe** β ART trainer (frontend/backend split for GRPO). Also use. | |
| - **Patronus AI** β Building "generative simulators" (auto-scaling RL environments). They care about curriculum difficulty scaling and verifiable rewards. | |
| - **Snorkel AI** β "2026 is the year of environments." They care about data quality and environment diversity. | |
| - **Hugging Face** β OpenEnv Hub, want environments deployed there | |
| - **Scale AI / Mercor** β Agent evaluation, structured task environments | |
| --- | |
| ## The Pitch (for judges) | |
| > "Spatial reasoning is the next frontier for LLM training β NeurIPS 2025 papers like OrigamiSpace showed that even GPT-5 fails at multi-step origami reasoning. But those are benchmarks, not training environments. We built OrigamiRL: the first multi-turn RL environment where an LLM agent learns to fold paper by outputting instructions, receiving geometric feedback, and improving through GRPO. Our reward function is fully verifiable β fold validity is checked against computational origami axioms, not an LLM judge. We built it on OpenEnv + Unsloth with a natural curriculum from single folds to full cranes." | |
| --- | |
| ## Prior Work (What Exists, Where the Gaps Are) | |
| ### 1. OrigamiSpace (NeurIPS 2025 Spotlight) | |
| - **Paper:** https://arxiv.org/abs/2511.18450 | |
| - **What it is:** Benchmark with 350 origami data instances (CP diagrams, folding processes, folded shapes). 4 evaluation tasks: Pattern Prediction, Multi-step Spatial Reasoning, Spatial Relationship Prediction, End-to-End CP Code Generation. | |
| - **Their compiler:** Outputs detailed flattened diagrams with crease locations and stacking relationships, supports interactive simulation with MLLMs, provides comprehensive error feedback. Checks: syntax validity, geometric foldability, no self-intersections, Kawasaki's theorem, Maekawa's theorem. | |
| - **Their reward metrics for code gen:** Hausdorff distance (shape similarity), dihedral angle distribution, bounding box aspect ratios, constraint satisfaction. | |
| - **Difficulty levels:** Easy (3-9 steps), Medium (10-19 steps), Hard (20-30 steps) | |
| - **Gap:** Single-turn only (LLM generates complete CP code in one shot). They mention RL exploration but it's not the focus. No multi-turn sequential folding. | |
| ### 2. GamiBench (Dec 2025) | |
| - **Paper:** https://arxiv.org/abs/2512.22207 | |
| - **What it is:** 186 regular + 186 impossible 2D crease patterns with 3D folded shapes from 6 viewpoints. 3 VQA tasks. | |
| - **Gap:** Evaluation-only, no training. Tests single-step spatial understanding. | |
| ### 3. SpatialThinker (NeurIPS 2025) | |
| - **Paper:** https://arxiv.org/abs/2511.07403 | |
| - **What it is:** 3D-aware MLLM trained with RL using dense spatial rewards. Constructs scene graphs. Multi-objective reward with lexicographic gating. | |
| - **Key architecture to steal:** Dense reward design with lexicographic ordering β format β count β accuracy β spatial. Nearly doubled RL training gains vs sparse rewards. Only needed 7K training samples with GRPO. | |
| - **Gap:** Static scene understanding (objects on a table), not sequential physical transformations. | |
| ### 4. rigid-origami Gym (IJCAI 2023) | |
| - **Repo:** https://github.com/belalugaX/rigid-origami | |
| - **Paper:** "Automating Rigid Origami Design" (https://arxiv.org/abs/2211.13219) | |
| - **What it is:** Gym environment where agent constructs crease pattern graphs on a board. Sparse rewards. Foldability validated by triangle intersection tests + kinematic rigidity model. Game terminates on non-foldable states. | |
| - **Gap:** Classical RL agents (discrete grid actions), NOT LLMs generating text. Rigid-origami tessellations only, not traditional origami. No natural language. | |
| ### 5. The Unique Gap We Fill | |
| Nobody has built a model that reasons about **sequential 2D-to-3D geometric transformations with physical constraints** through **natural language instructions** in a **multi-turn RL training loop**. Origami is uniquely hard because it requires tracking how a flat sheet's topology changes through a sequence of folds β mental rotation, spatial visualization, and perspective-taking all at once. | |
| --- | |
| ## Environment Design | |
| ### Architecture Overview | |
| ``` | |
| +---------------------------------------------------+ | |
| | OpenEnv Server | | |
| | +-----------+ +----------+ +--------------+ | | |
| | | State | | Action | | Reward | | | |
| | | (FOLD JSON| | (LLM | | (Dense, | | | |
| | | + target)| | output) | | verifiable) | | | |
| | +-----------+ +----------+ +--------------+ | | |
| | | | | | | |
| | v v v | | |
| | +-----------------------------------------------+| | |
| | | Paper Geometry Engine (Python) || | |
| | | - Polygon state (Shapely) || | |
| | | - Fold operations (reflection across line) || | |
| | | - Kawasaki/Maekawa constraint checks || | |
| | | - Layer tracking || | |
| | | - FOLD format import/export || | |
| | +-----------------------------------------------+| | |
| | | | | |
| | v | | |
| | +-----------------------------------------------+| | |
| | | Three.js Visualizer (Demo only) || | |
| | | - 3D fold animation || | |
| | | - Strain heatmap || | |
| | | - Instruction stream || | |
| | +-----------------------------------------------+| | |
| +---------------------------------------------------+ | |
| | ^ | |
| v | | |
| +---------------------------------------------------+ | |
| | Unsloth ART / GRPO Trainer | | |
| | - Qwen2.5-VL-7B or Qwen3-4B base model | | |
| | - LoRA/QLoRA for efficient training | | |
| | - Multi-turn rollouts | | |
| +---------------------------------------------------+ | |
| ``` | |
| ### OpenEnv Spec Compliance | |
| Must implement these APIs: | |
| ```python | |
| class OrigamiEnv: | |
| async def reset() -> Observation # New episode: flat paper + target | |
| async def step(action) -> (Observation, reward, done, info) | |
| async def state() -> State # Current paper geometry | |
| async def close() # Cleanup | |
| ``` | |
| OpenEnv repo: https://github.com/meta-pytorch/OpenEnv | |
| Install: `pip install -e .` then `openenv init origami_env` | |
| ### State Space | |
| ```python | |
| @dataclass | |
| class OrigamiState: | |
| # Current paper geometry | |
| vertices: List[Tuple[float, float]] # 2D vertex positions | |
| edges: List[Tuple[int, int]] # Edge connectivity | |
| edges_assignment: List[str] # 'M', 'V', 'B', 'F' (mountain/valley/boundary/flat) | |
| edges_foldAngle: List[float] # -180 to 180 degrees | |
| faces: List[List[int]] # Face vertex indices | |
| layer_order: List[List[int]] # Face stacking order | |
| # Episode context | |
| target_crease_pattern: dict # Target FOLD JSON | |
| target_shape_image: Optional[np.ndarray] # Target folded shape (for multimodal) | |
| instruction_history: List[str] # Previous instructions | |
| step_count: int | |
| max_steps: int | |
| ``` | |
| This maps directly to the **FOLD format** (JSON-based, used by all origami software): | |
| ```json | |
| { | |
| "vertices_coords": [[0,0], [1,0], [1,1], [0,1]], | |
| "edges_vertices": [[0,1], [1,2], [2,3], [3,0]], | |
| "edges_assignment": ["B", "B", "B", "B"], | |
| "edges_foldAngle": [0, 0, 0, 0], | |
| "faces_vertices": [[0, 1, 2, 3]] | |
| } | |
| ``` | |
| FOLD spec: https://github.com/edemaine/fold | |
| FOLD JS library: https://edemaine.github.io/fold/ | |
| ### Action Space | |
| The LLM outputs a JSON action: | |
| ```json | |
| { | |
| "instruction": "Fold the top edge down to meet the bottom edge", | |
| "fold_line": [[0, 0.5], [1, 0.5]], | |
| "fold_angle": -180, | |
| "assignment": "V" | |
| } | |
| ``` | |
| The `instruction` field is natural language (what we're training the model to produce well). The geometric fields are the verifiable representation. During training, the model outputs both; for the final demo, the NL instruction is the star. | |
| Alternative simpler action (for early iterations): | |
| ```json | |
| { | |
| "instruction": "Valley fold along the horizontal center line", | |
| "fold_type": "valley", | |
| "fold_axis": "horizontal", | |
| "fold_position": 0.5 | |
| } | |
| ``` | |
| ### Reward Function β Dense, Multi-Objective, Lexicographically Gated | |
| Inspired by SpatialThinker's design. Rewards are computed in order; later rewards only apply if earlier gates pass. | |
| ```python | |
| def compute_reward(state, action, new_state, target) -> dict: | |
| rewards = {} | |
| # LEVEL 1: Format (gate for everything else) | |
| # Does the output parse into a valid fold operation? | |
| rewards['format'] = 1.0 if parseable(action) else 0.0 | |
| if rewards['format'] == 0: | |
| return rewards # Stop here | |
| # LEVEL 2: Local Geometric Validity | |
| # Kawasaki's theorem: sector angles at each interior vertex sum to 2pi | |
| kawasaki_valid = check_kawasaki(new_state) | |
| # Maekawa's theorem: |M - V| = 2 at each interior vertex | |
| maekawa_valid = check_maekawa(new_state) | |
| # No self-intersection | |
| no_intersection = check_no_self_intersection(new_state) | |
| rewards['validity'] = (kawasaki_valid + maekawa_valid + no_intersection) / 3.0 | |
| if rewards['validity'] < 0.5: | |
| return rewards # Stop here | |
| # LEVEL 3: Physical Feasibility | |
| # Can this fold actually be performed given layer stack? | |
| layer_consistent = check_layer_ordering(new_state) | |
| fold_achievable = check_fold_angle_feasible(new_state) | |
| rewards['feasibility'] = (layer_consistent + fold_achievable) / 2.0 | |
| # LEVEL 4: Progress Toward Target (Dense) | |
| # Crease pattern graph similarity | |
| cp_similarity = crease_pattern_similarity(new_state, target) | |
| # Fold angle distribution match | |
| angle_similarity = fold_angle_distribution_match(new_state, target) | |
| # Bounding box aspect ratio match | |
| bbox_similarity = bounding_box_similarity(new_state, target) | |
| rewards['progress'] = 0.4 * cp_similarity + 0.4 * angle_similarity + 0.2 * bbox_similarity | |
| # LEVEL 5: Completion Bonus | |
| if shape_matches_target(new_state, target, tolerance=0.05): | |
| rewards['completion'] = 10.0 | |
| # LEVEL 6: Efficiency | |
| rewards['efficiency'] = -0.01 # Small step penalty to encourage fewer folds | |
| # Total | |
| rewards['total'] = ( | |
| 0.1 * rewards['format'] + | |
| 0.2 * rewards['validity'] + | |
| 0.1 * rewards['feasibility'] + | |
| 0.5 * rewards['progress'] + | |
| rewards.get('completion', 0) + | |
| rewards['efficiency'] | |
| ) | |
| return rewards | |
| ``` | |
| ### Key Origami Theorems for Verification | |
| These are the verifiable constraints β the "unit tests" of origami: | |
| 1. **Kawasaki's Theorem:** At any interior vertex of a flat-foldable crease pattern, the alternating sum of sector angles equals zero (equivalently, they sum to 2pi on each side). NECESSARY condition for flat-foldability. | |
| 2. **Maekawa's Theorem:** At any interior vertex, the number of mountain folds minus valley folds equals +/-2. |M - V| = 2. | |
| 3. **No self-intersection:** Faces cannot penetrate each other during folding. | |
| 4. **Euler's formula for planar graphs:** V - E + F = 2 (sanity check on graph structure). | |
| 5. **Huzita-Hatori axioms:** The 7 axioms defining all possible single-fold operations (point-to-point, point-to-line, line-to-line, etc.). These define the VALID action space. | |
| ### Curriculum Design | |
| | Level | Folds | Examples | Complexity | | |
| |-------|-------|----------|-----------| | |
| | 1 | 1 | Valley fold in half, mountain fold corner | Single fold validity | | |
| | 2 | 2-3 | Paper airplane nose, triangle fold | Sequential dependency | | |
| | 3 | 4-6 | Simple boat, fortune teller | Multi-step with symmetry | | |
| | 4 | 7-12 | Paper airplane (full), jumping frog | Longer horizon planning | | |
| | 5 | 13-20 | Crane, lily | Complex spatial tracking | | |
| For the hackathon, focus on Levels 1-3. Even showing reward improvement on Level 1-2 is a strong result. | |
| --- | |
| ## Core Implementation: Python Geometry Engine | |
| This is the MOST IMPORTANT piece. Pure Python, no JS dependencies. | |
| ```python | |
| import numpy as np | |
| from shapely.geometry import Polygon, LineString, MultiPolygon | |
| from shapely.ops import split | |
| from typing import List, Tuple, Dict | |
| import json | |
| class PaperState: | |
| """Represents the current state of the origami paper.""" | |
| def __init__(self, size: float = 1.0): | |
| # Start with a unit square | |
| self.regions = [Polygon([(0,0), (size,0), (size,size), (0,size)])] | |
| self.fold_history = [] | |
| self.crease_lines = [] | |
| self.crease_assignments = [] # 'M' or 'V' | |
| self.crease_angles = [] | |
| self.layer_order = [0] # Stack order of regions | |
| def apply_fold(self, fold_line: LineString, angle: float, assignment: str) -> dict: | |
| """ | |
| Apply a fold operation. Returns dict with validity info. | |
| fold_line: Shapely LineString defining the fold axis | |
| angle: fold angle in degrees (-180 to 180) | |
| assignment: 'M' (mountain) or 'V' (valley) | |
| """ | |
| result = {'valid': True, 'errors': []} | |
| # 1. Split regions by fold line | |
| new_regions = [] | |
| for region in self.regions: | |
| if fold_line.intersects(region): | |
| parts = split(region, fold_line) | |
| new_regions.extend(parts.geoms) | |
| else: | |
| new_regions.append(region) | |
| # 2. Determine which side folds (based on assignment) | |
| folding_side = [] | |
| staying_side = [] | |
| for region in new_regions: | |
| centroid = region.centroid | |
| side = self._point_side(centroid, fold_line) | |
| if side > 0: | |
| folding_side.append(region) | |
| else: | |
| staying_side.append(region) | |
| # 3. Reflect folding regions across fold line | |
| reflected = [self._reflect_polygon(r, fold_line) for r in folding_side] | |
| # 4. Update state | |
| self.regions = staying_side + reflected | |
| self.crease_lines.append(fold_line) | |
| self.crease_assignments.append(assignment) | |
| self.crease_angles.append(angle) | |
| self.fold_history.append({ | |
| 'line': list(fold_line.coords), | |
| 'angle': angle, | |
| 'assignment': assignment | |
| }) | |
| # 5. Update layer order | |
| self._update_layer_order(staying_side, reflected) | |
| return result | |
| def _reflect_polygon(self, poly: Polygon, line: LineString) -> Polygon: | |
| """Reflect a polygon across a line.""" | |
| coords = list(poly.exterior.coords) | |
| reflected_coords = [self._reflect_point(p, line) for p in coords] | |
| return Polygon(reflected_coords) | |
| def _reflect_point(self, point: tuple, line: LineString) -> tuple: | |
| """Reflect a point across a line.""" | |
| p = np.array(point[:2]) | |
| l1 = np.array(line.coords[0]) | |
| l2 = np.array(line.coords[1]) | |
| d = l2 - l1 | |
| d = d / np.linalg.norm(d) | |
| # Reflection formula: p' = p - 2(p-l1).n * n where n is normal to line | |
| n = np.array([-d[1], d[0]]) | |
| v = p - l1 | |
| return tuple(p - 2 * np.dot(v, n) * n) | |
| def _point_side(self, point, line: LineString) -> float: | |
| """Returns positive if point is on left side of line, negative if right.""" | |
| p = np.array([point.x, point.y]) | |
| l1 = np.array(line.coords[0]) | |
| l2 = np.array(line.coords[1]) | |
| return float(np.cross(l2 - l1, p - l1)) | |
| def _update_layer_order(self, staying, reflected): | |
| """Update the layer stacking order after a fold.""" | |
| self.layer_order = list(range(len(staying))) + \ | |
| list(range(len(staying), len(staying) + len(reflected))) | |
| def to_fold_json(self) -> dict: | |
| """Export current state as FOLD format JSON.""" | |
| vertices = set() | |
| for line in self.crease_lines: | |
| for coord in line.coords: | |
| vertices.add(tuple(round(c, 10) for c in coord)) | |
| # Add boundary vertices | |
| for region in self.regions: | |
| for coord in region.exterior.coords: | |
| vertices.add(tuple(round(c, 10) for c in coord[:2])) | |
| vertices = sorted(list(vertices)) | |
| vertex_map = {v: i for i, v in enumerate(vertices)} | |
| edge_set = set() | |
| edges_list = [] | |
| assignments_list = [] | |
| angles_list = [] | |
| # Add crease edges | |
| for i, line in enumerate(self.crease_lines): | |
| c = [tuple(round(x, 10) for x in coord) for coord in line.coords] | |
| edge = tuple(sorted([vertex_map[c[0]], vertex_map[c[1]]])) | |
| if edge not in edge_set: | |
| edge_set.add(edge) | |
| edges_list.append(list(edge)) | |
| assignments_list.append(self.crease_assignments[i]) | |
| angles_list.append(self.crease_angles[i]) | |
| return { | |
| 'vertices_coords': [list(v) for v in vertices], | |
| 'edges_vertices': edges_list, | |
| 'edges_assignment': assignments_list, | |
| 'edges_foldAngle': angles_list, | |
| } | |
| class OrigamiVerifier: | |
| """Verifiable reward functions based on origami theorems.""" | |
| @staticmethod | |
| def check_kawasaki(state: PaperState) -> bool: | |
| """Kawasaki's theorem: alternating sum of angles at each interior vertex = 0.""" | |
| fold_json = state.to_fold_json() | |
| vertices = fold_json['vertices_coords'] | |
| edges = fold_json['edges_vertices'] | |
| for v_idx in range(len(vertices)): | |
| v = vertices[v_idx] | |
| incident_edges = [e for e in edges if v_idx in e] | |
| if len(incident_edges) < 4: | |
| continue # Need degree-4+ for Kawasaki | |
| # Calculate sector angles | |
| angles = [] | |
| for e in incident_edges: | |
| other = e[1] if e[0] == v_idx else e[0] | |
| other_v = vertices[other] | |
| angle = np.arctan2(other_v[1] - v[1], other_v[0] - v[0]) | |
| angles.append(angle) | |
| angles.sort() | |
| sector_angles = [] | |
| for i in range(len(angles) - 1): | |
| sector_angles.append(angles[i+1] - angles[i]) | |
| sector_angles.append(2*np.pi - (angles[-1] - angles[0])) | |
| # Kawasaki: alternating sum should be ~0 | |
| if len(sector_angles) >= 4: | |
| alt_sum = sum(sector_angles[::2]) - sum(sector_angles[1::2]) | |
| if abs(alt_sum) > 0.01: | |
| return False | |
| return True | |
| @staticmethod | |
| def check_maekawa(state: PaperState) -> bool: | |
| """Maekawa's theorem: |M - V| = 2 at each interior vertex.""" | |
| fold_json = state.to_fold_json() | |
| vertices = fold_json['vertices_coords'] | |
| edges = fold_json['edges_vertices'] | |
| assignments = fold_json['edges_assignment'] | |
| for v_idx in range(len(vertices)): | |
| incident = [(i, e) for i, e in enumerate(edges) if v_idx in e] | |
| m_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'M') | |
| v_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'V') | |
| if m_count + v_count >= 4: # Interior vertex with folds | |
| if abs(m_count - v_count) != 2: | |
| return False | |
| return True | |
| @staticmethod | |
| def crease_pattern_similarity(state: PaperState, target_fold_json: dict) -> float: | |
| """Compare current crease pattern to target. Returns 0-1 similarity.""" | |
| current = state.to_fold_json() | |
| n_current = len(current.get('edges_vertices', [])) | |
| n_target = len(target_fold_json.get('edges_vertices', [])) | |
| if n_target == 0: | |
| return 1.0 if n_current == 0 else 0.0 | |
| edge_count_sim = 1.0 - abs(n_current - n_target) / max(n_target, 1) | |
| edge_count_sim = max(0, edge_count_sim) | |
| current_assignments = current.get('edges_assignment', []) | |
| target_assignments = target_fold_json.get('edges_assignment', []) | |
| c_m = current_assignments.count('M') | |
| c_v = current_assignments.count('V') | |
| t_m = target_assignments.count('M') | |
| t_v = target_assignments.count('V') | |
| total = max(t_m + t_v, 1) | |
| assign_sim = 1.0 - (abs(c_m - t_m) + abs(c_v - t_v)) / (2 * total) | |
| assign_sim = max(0, assign_sim) | |
| return 0.5 * edge_count_sim + 0.5 * assign_sim | |
| ``` | |
| --- | |
| ## OpenEnv Environment Wrapper | |
| ```python | |
| # origami_env/server.py | |
| from openenv.core import Environment | |
| from paper_engine import PaperState, OrigamiVerifier | |
| from shapely.geometry import LineString | |
| import json | |
| class OrigamiEnvironment(Environment): | |
| def __init__(self, targets_dir="targets/", max_steps=20): | |
| self.targets_dir = targets_dir | |
| self.max_steps = max_steps | |
| self.paper = None | |
| self.target = None | |
| self.step_count = 0 | |
| async def reset(self, target_id=None): | |
| self.paper = PaperState(size=1.0) | |
| self.target = self._load_target(target_id) | |
| self.step_count = 0 | |
| return self._get_observation() | |
| async def step(self, action): | |
| self.step_count += 1 | |
| # Parse action | |
| try: | |
| fold_line = LineString(action['fold_line']) | |
| angle = action['fold_angle'] | |
| assignment = action['assignment'] | |
| except (KeyError, Exception): | |
| reward = {'format': 0, 'total': -0.1} | |
| return self._get_observation(), reward, False, {'error': 'parse_failed'} | |
| # Apply fold | |
| result = self.paper.apply_fold(fold_line, angle, assignment) | |
| # Compute rewards | |
| reward = self._compute_reward(result) | |
| # Check termination | |
| done = ( | |
| self.step_count >= self.max_steps or | |
| reward.get('completion', 0) > 0 | |
| ) | |
| return self._get_observation(), reward, done, {} | |
| async def state(self): | |
| return { | |
| 'paper': self.paper.to_fold_json(), | |
| 'target': self.target, | |
| 'step': self.step_count, | |
| 'fold_history': self.paper.fold_history | |
| } | |
| def _compute_reward(self, fold_result): | |
| rewards = {} | |
| rewards['format'] = 1.0 | |
| kawasaki = OrigamiVerifier.check_kawasaki(self.paper) | |
| maekawa = OrigamiVerifier.check_maekawa(self.paper) | |
| rewards['validity'] = (float(kawasaki) + float(maekawa)) / 2.0 | |
| rewards['progress'] = OrigamiVerifier.crease_pattern_similarity( | |
| self.paper, self.target | |
| ) | |
| if rewards['progress'] > 0.95: | |
| rewards['completion'] = 10.0 | |
| rewards['efficiency'] = -0.01 | |
| rewards['total'] = ( | |
| 0.1 * rewards['format'] + | |
| 0.2 * rewards['validity'] + | |
| 0.6 * rewards['progress'] + | |
| rewards.get('completion', 0) + | |
| rewards['efficiency'] | |
| ) | |
| return rewards | |
| def _get_observation(self): | |
| return { | |
| 'paper_state': self.paper.to_fold_json(), | |
| 'target': self.target, | |
| 'step': self.step_count, | |
| 'instruction_history': [str(f['line']) for f in self.paper.fold_history] | |
| } | |
| def _load_target(self, target_id): | |
| if target_id: | |
| with open(f"{self.targets_dir}/{target_id}.fold") as f: | |
| return json.load(f) | |
| # Default: simple valley fold in half | |
| return { | |
| 'vertices_coords': [[0,0], [1,0], [1,1], [0,1], [0,0.5], [1,0.5]], | |
| 'edges_vertices': [[0,1], [1,2], [2,3], [3,0], [4,5]], | |
| 'edges_assignment': ['B', 'B', 'B', 'B', 'V'], | |
| 'edges_foldAngle': [0, 0, 0, 0, -180], | |
| } | |
| ``` | |
| --- | |
| ## Training Script (Unsloth GRPO) | |
| ```python | |
| # train.py | |
| from unsloth import FastLanguageModel | |
| from trl import GRPOConfig, GRPOTrainer | |
| import torch | |
| # Load model | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| model_name="unsloth/Qwen2.5-7B-Instruct", | |
| max_seq_length=4096, | |
| load_in_4bit=True, | |
| ) | |
| # Add LoRA | |
| model = FastLanguageModel.get_peft_model( | |
| model, | |
| r=32, | |
| target_modules=["q_proj", "k_proj", "v_proj", "o_proj", | |
| "gate_proj", "up_proj", "down_proj"], | |
| lora_alpha=32, | |
| lora_dropout=0, | |
| use_gradient_checkpointing="unsloth", | |
| ) | |
| # Reward function | |
| def origami_reward(completions, prompts): | |
| """Compute rewards for a batch of completions.""" | |
| rewards = [] | |
| for completion in completions: | |
| try: | |
| action = parse_fold_action(completion) | |
| paper = PaperState() | |
| result = paper.apply_fold(action['fold_line'], action['angle'], action['assignment']) | |
| r = compute_reward(paper, target) | |
| rewards.append(r['total']) | |
| except Exception: | |
| rewards.append(-0.1) | |
| return rewards | |
| # GRPO Config | |
| config = GRPOConfig( | |
| output_dir="origami-grpo", | |
| num_train_epochs=3, | |
| per_device_train_batch_size=4, | |
| gradient_accumulation_steps=4, | |
| learning_rate=5e-6, | |
| max_completion_length=512, | |
| num_generations=8, | |
| temperature=1.0, | |
| logging_steps=1, | |
| ) | |
| dataset = load_origami_prompts() | |
| trainer = GRPOTrainer( | |
| model=model, | |
| config=config, | |
| train_dataset=dataset, | |
| reward_funcs=[origami_reward], | |
| tokenizer=tokenizer, | |
| ) | |
| trainer.train() | |
| ``` | |
| --- | |
| ## Visualization (Demo Only β Not in Training Loop) | |
| ### Options | |
| 1. **Origami Simulator** β https://github.com/amandaghassaei/OrigamiSimulator β Three.js, accepts FOLD files, shows folding animation with strain visualization | |
| 2. **PackCAD** β https://packcad.com/ β Web-based, SVG crease patterns, rigid folding simulation | |
| 3. **Custom Three.js** β Simpler but more control | |
| ### Demo UI Layout | |
| ``` | |
| +----------------------+----------------------+ | |
| | Instruction Stream | 3D Fold Viewer | | |
| | | | | |
| | Step 1: Valley fold | [Three.js canvas] | | |
| | along center [OK] | | | |
| | | Paper animating | | |
| | Step 2: Fold top | fold by fold | | |
| | corners to center | | | |
| | | | | |
| +----------------------+----------------------+ | |
| | Reward Dashboard | | |
| | Format: ========== 1.0 | | |
| | Validity: ========.. 0.8 | | |
| | Progress: ======.... 0.6 | | |
| | Total: =======... 0.72 | | |
| | | | |
| | [Reward curve over training steps] | | |
| +----------------------------------------------+ | |
| ``` | |
| --- | |
| ## Key Libraries and Resources | |
| | Tool | Purpose | Link | | |
| |------|---------|------| | |
| | OpenEnv | Environment framework | https://github.com/meta-pytorch/OpenEnv | | |
| | Unsloth | GRPO training | https://github.com/unslothai/unsloth | | |
| | OpenPipe ART | Multi-turn RL trainer | https://github.com/OpenPipe/ART | | |
| | FOLD format | Origami data structure | https://github.com/edemaine/fold | | |
| | Rabbit Ear | JS origami library | https://github.com/rabbit-ear/rabbit-ear | | |
| | Origami Simulator | 3D visualization | https://github.com/amandaghassaei/OrigamiSimulator | | |
| | PackCAD | Folding simulation | https://packcad.com/ | | |
| | Shapely | Python geometry | pip install shapely | | |
| | rigid-origami gym | Reference gym env | https://github.com/belalugaX/rigid-origami | | |
| ### Papers to Cite | |
| - OrigamiSpace: https://arxiv.org/abs/2511.18450 | |
| - GamiBench: https://arxiv.org/abs/2512.22207 | |
| - SpatialThinker: https://arxiv.org/abs/2511.07403 | |
| - Automating Rigid Origami Design: https://arxiv.org/abs/2211.13219 | |
| - FOLD format spec: https://github.com/edemaine/fold/blob/main/doc/spec.md | |
| --- | |
| ## Priority Build Order | |
| 1. **Python geometry engine** β PaperState class with fold operations and FOLD export | |
| 2. **Verifier functions** β Kawasaki, Maekawa, similarity metrics | |
| 3. **OpenEnv wrapper** β step/reset/state API | |
| 4. **Simple targets** β Hand-create 5-10 Level 1-2 targets as .fold files | |
| 5. **Training script** β Wire up Unsloth GRPO with reward function | |
| 6. **Run training** β Even on small model, get reward curves | |
| 7. **Three.js visualizer** β For demo only, not in training loop | |
| 8. **Before/after demo** β Show base model vs trained model outputs | |
| 9. **Polish presentation narrative** | |
| --- | |
| ## Narrative for Judges | |
| **The story arc:** | |
| 1. "LLMs are great at text but terrible at spatial reasoning" | |
| 2. "Origami is the perfect testbed β it's sequential, physical, and verifiable" | |
| 3. "NeurIPS 2025 showed even GPT-5 fails at origami benchmarks, but nobody built a TRAINING environment" | |
| 4. "We built OrigamiRL β the first multi-turn RL environment for origami instruction generation" | |
| 5. "Our rewards come from math theorems, not vibes β Kawasaki's theorem is our unit test" | |
| 6. "Watch the model go from generating paper-tearing nonsense to valid fold sequences" | |
| 7. "This generalizes to any domain where LLMs need to output structured physical instructions" | |