# OrigamiRL — OpenEnv Hackathon Handoff Document ## TL;DR Build the **first multi-turn RL environment where an LLM learns to generate origami folding instructions**, verified by a computational origami simulator. Target the OpenEnv Hackathon (March 7-8, 2026, SF — $100K+ in prizes). Use OpenEnv spec + Unsloth GRPO for training. Dense verifiable rewards from origami geometry theorems (Kawasaki, Maekawa). No learned reward model needed. --- ## Hackathon Context - **Event:** OpenEnv Hackathon SF, hosted by Cerebral Valley + Shack15 + Meta/PyTorch - **Date:** March 7-8, 2026 (happening NOW) - **Prize:** $100K+ cash - **Teams:** Up to 4 people - **Format:** Build RL environments, post-train a base model ### Judging Criteria | Category | Weight | What Matters | |----------|--------|-------------| | Environment Innovation | 40% | Novel, creative, challenging. Does it meaningfully test agent behavior? | | Storytelling | 30% | Clear problem explanation, engaging demo, easy to follow | | Training Script Showing Improvement | 20% | Observable reward curves, before/after behavior | | Reward and Training Pipeline Setup | 10% | Coherent reward logic, meaningful improvement in inference | ### Key Sponsors to Impress - **Meta/PyTorch** — OpenEnv creators, want environments using their spec - **Unsloth AI** — GRPO training infra, ART (Agent Reinforcement Trainer). USE THEIR TOOLS. - **OpenPipe** — ART trainer (frontend/backend split for GRPO). Also use. - **Patronus AI** — Building "generative simulators" (auto-scaling RL environments). They care about curriculum difficulty scaling and verifiable rewards. - **Snorkel AI** — "2026 is the year of environments." They care about data quality and environment diversity. - **Hugging Face** — OpenEnv Hub, want environments deployed there - **Scale AI / Mercor** — Agent evaluation, structured task environments --- ## The Pitch (for judges) > "Spatial reasoning is the next frontier for LLM training — NeurIPS 2025 papers like OrigamiSpace showed that even GPT-5 fails at multi-step origami reasoning. But those are benchmarks, not training environments. We built OrigamiRL: the first multi-turn RL environment where an LLM agent learns to fold paper by outputting instructions, receiving geometric feedback, and improving through GRPO. Our reward function is fully verifiable — fold validity is checked against computational origami axioms, not an LLM judge. We built it on OpenEnv + Unsloth with a natural curriculum from single folds to full cranes." --- ## Prior Work (What Exists, Where the Gaps Are) ### 1. OrigamiSpace (NeurIPS 2025 Spotlight) - **Paper:** https://arxiv.org/abs/2511.18450 - **What it is:** Benchmark with 350 origami data instances (CP diagrams, folding processes, folded shapes). 4 evaluation tasks: Pattern Prediction, Multi-step Spatial Reasoning, Spatial Relationship Prediction, End-to-End CP Code Generation. - **Their compiler:** Outputs detailed flattened diagrams with crease locations and stacking relationships, supports interactive simulation with MLLMs, provides comprehensive error feedback. Checks: syntax validity, geometric foldability, no self-intersections, Kawasaki's theorem, Maekawa's theorem. - **Their reward metrics for code gen:** Hausdorff distance (shape similarity), dihedral angle distribution, bounding box aspect ratios, constraint satisfaction. - **Difficulty levels:** Easy (3-9 steps), Medium (10-19 steps), Hard (20-30 steps) - **Gap:** Single-turn only (LLM generates complete CP code in one shot). They mention RL exploration but it's not the focus. No multi-turn sequential folding. ### 2. GamiBench (Dec 2025) - **Paper:** https://arxiv.org/abs/2512.22207 - **What it is:** 186 regular + 186 impossible 2D crease patterns with 3D folded shapes from 6 viewpoints. 3 VQA tasks. - **Gap:** Evaluation-only, no training. Tests single-step spatial understanding. ### 3. SpatialThinker (NeurIPS 2025) - **Paper:** https://arxiv.org/abs/2511.07403 - **What it is:** 3D-aware MLLM trained with RL using dense spatial rewards. Constructs scene graphs. Multi-objective reward with lexicographic gating. - **Key architecture to steal:** Dense reward design with lexicographic ordering — format → count → accuracy → spatial. Nearly doubled RL training gains vs sparse rewards. Only needed 7K training samples with GRPO. - **Gap:** Static scene understanding (objects on a table), not sequential physical transformations. ### 4. rigid-origami Gym (IJCAI 2023) - **Repo:** https://github.com/belalugaX/rigid-origami - **Paper:** "Automating Rigid Origami Design" (https://arxiv.org/abs/2211.13219) - **What it is:** Gym environment where agent constructs crease pattern graphs on a board. Sparse rewards. Foldability validated by triangle intersection tests + kinematic rigidity model. Game terminates on non-foldable states. - **Gap:** Classical RL agents (discrete grid actions), NOT LLMs generating text. Rigid-origami tessellations only, not traditional origami. No natural language. ### 5. The Unique Gap We Fill Nobody has built a model that reasons about **sequential 2D-to-3D geometric transformations with physical constraints** through **natural language instructions** in a **multi-turn RL training loop**. Origami is uniquely hard because it requires tracking how a flat sheet's topology changes through a sequence of folds — mental rotation, spatial visualization, and perspective-taking all at once. --- ## Environment Design ### Architecture Overview ``` +---------------------------------------------------+ | OpenEnv Server | | +-----------+ +----------+ +--------------+ | | | State | | Action | | Reward | | | | (FOLD JSON| | (LLM | | (Dense, | | | | + target)| | output) | | verifiable) | | | +-----------+ +----------+ +--------------+ | | | | | | | v v v | | +-----------------------------------------------+| | | Paper Geometry Engine (Python) || | | - Polygon state (Shapely) || | | - Fold operations (reflection across line) || | | - Kawasaki/Maekawa constraint checks || | | - Layer tracking || | | - FOLD format import/export || | +-----------------------------------------------+| | | | | v | | +-----------------------------------------------+| | | Three.js Visualizer (Demo only) || | | - 3D fold animation || | | - Strain heatmap || | | - Instruction stream || | +-----------------------------------------------+| +---------------------------------------------------+ | ^ v | +---------------------------------------------------+ | Unsloth ART / GRPO Trainer | | - Qwen2.5-VL-7B or Qwen3-4B base model | | - LoRA/QLoRA for efficient training | | - Multi-turn rollouts | +---------------------------------------------------+ ``` ### OpenEnv Spec Compliance Must implement these APIs: ```python class OrigamiEnv: async def reset() -> Observation # New episode: flat paper + target async def step(action) -> (Observation, reward, done, info) async def state() -> State # Current paper geometry async def close() # Cleanup ``` OpenEnv repo: https://github.com/meta-pytorch/OpenEnv Install: `pip install -e .` then `openenv init origami_env` ### State Space ```python @dataclass class OrigamiState: # Current paper geometry vertices: List[Tuple[float, float]] # 2D vertex positions edges: List[Tuple[int, int]] # Edge connectivity edges_assignment: List[str] # 'M', 'V', 'B', 'F' (mountain/valley/boundary/flat) edges_foldAngle: List[float] # -180 to 180 degrees faces: List[List[int]] # Face vertex indices layer_order: List[List[int]] # Face stacking order # Episode context target_crease_pattern: dict # Target FOLD JSON target_shape_image: Optional[np.ndarray] # Target folded shape (for multimodal) instruction_history: List[str] # Previous instructions step_count: int max_steps: int ``` This maps directly to the **FOLD format** (JSON-based, used by all origami software): ```json { "vertices_coords": [[0,0], [1,0], [1,1], [0,1]], "edges_vertices": [[0,1], [1,2], [2,3], [3,0]], "edges_assignment": ["B", "B", "B", "B"], "edges_foldAngle": [0, 0, 0, 0], "faces_vertices": [[0, 1, 2, 3]] } ``` FOLD spec: https://github.com/edemaine/fold FOLD JS library: https://edemaine.github.io/fold/ ### Action Space The LLM outputs a JSON action: ```json { "instruction": "Fold the top edge down to meet the bottom edge", "fold_line": [[0, 0.5], [1, 0.5]], "fold_angle": -180, "assignment": "V" } ``` The `instruction` field is natural language (what we're training the model to produce well). The geometric fields are the verifiable representation. During training, the model outputs both; for the final demo, the NL instruction is the star. Alternative simpler action (for early iterations): ```json { "instruction": "Valley fold along the horizontal center line", "fold_type": "valley", "fold_axis": "horizontal", "fold_position": 0.5 } ``` ### Reward Function — Dense, Multi-Objective, Lexicographically Gated Inspired by SpatialThinker's design. Rewards are computed in order; later rewards only apply if earlier gates pass. ```python def compute_reward(state, action, new_state, target) -> dict: rewards = {} # LEVEL 1: Format (gate for everything else) # Does the output parse into a valid fold operation? rewards['format'] = 1.0 if parseable(action) else 0.0 if rewards['format'] == 0: return rewards # Stop here # LEVEL 2: Local Geometric Validity # Kawasaki's theorem: sector angles at each interior vertex sum to 2pi kawasaki_valid = check_kawasaki(new_state) # Maekawa's theorem: |M - V| = 2 at each interior vertex maekawa_valid = check_maekawa(new_state) # No self-intersection no_intersection = check_no_self_intersection(new_state) rewards['validity'] = (kawasaki_valid + maekawa_valid + no_intersection) / 3.0 if rewards['validity'] < 0.5: return rewards # Stop here # LEVEL 3: Physical Feasibility # Can this fold actually be performed given layer stack? layer_consistent = check_layer_ordering(new_state) fold_achievable = check_fold_angle_feasible(new_state) rewards['feasibility'] = (layer_consistent + fold_achievable) / 2.0 # LEVEL 4: Progress Toward Target (Dense) # Crease pattern graph similarity cp_similarity = crease_pattern_similarity(new_state, target) # Fold angle distribution match angle_similarity = fold_angle_distribution_match(new_state, target) # Bounding box aspect ratio match bbox_similarity = bounding_box_similarity(new_state, target) rewards['progress'] = 0.4 * cp_similarity + 0.4 * angle_similarity + 0.2 * bbox_similarity # LEVEL 5: Completion Bonus if shape_matches_target(new_state, target, tolerance=0.05): rewards['completion'] = 10.0 # LEVEL 6: Efficiency rewards['efficiency'] = -0.01 # Small step penalty to encourage fewer folds # Total rewards['total'] = ( 0.1 * rewards['format'] + 0.2 * rewards['validity'] + 0.1 * rewards['feasibility'] + 0.5 * rewards['progress'] + rewards.get('completion', 0) + rewards['efficiency'] ) return rewards ``` ### Key Origami Theorems for Verification These are the verifiable constraints — the "unit tests" of origami: 1. **Kawasaki's Theorem:** At any interior vertex of a flat-foldable crease pattern, the alternating sum of sector angles equals zero (equivalently, they sum to 2pi on each side). NECESSARY condition for flat-foldability. 2. **Maekawa's Theorem:** At any interior vertex, the number of mountain folds minus valley folds equals +/-2. |M - V| = 2. 3. **No self-intersection:** Faces cannot penetrate each other during folding. 4. **Euler's formula for planar graphs:** V - E + F = 2 (sanity check on graph structure). 5. **Huzita-Hatori axioms:** The 7 axioms defining all possible single-fold operations (point-to-point, point-to-line, line-to-line, etc.). These define the VALID action space. ### Curriculum Design | Level | Folds | Examples | Complexity | |-------|-------|----------|-----------| | 1 | 1 | Valley fold in half, mountain fold corner | Single fold validity | | 2 | 2-3 | Paper airplane nose, triangle fold | Sequential dependency | | 3 | 4-6 | Simple boat, fortune teller | Multi-step with symmetry | | 4 | 7-12 | Paper airplane (full), jumping frog | Longer horizon planning | | 5 | 13-20 | Crane, lily | Complex spatial tracking | For the hackathon, focus on Levels 1-3. Even showing reward improvement on Level 1-2 is a strong result. --- ## Core Implementation: Python Geometry Engine This is the MOST IMPORTANT piece. Pure Python, no JS dependencies. ```python import numpy as np from shapely.geometry import Polygon, LineString, MultiPolygon from shapely.ops import split from typing import List, Tuple, Dict import json class PaperState: """Represents the current state of the origami paper.""" def __init__(self, size: float = 1.0): # Start with a unit square self.regions = [Polygon([(0,0), (size,0), (size,size), (0,size)])] self.fold_history = [] self.crease_lines = [] self.crease_assignments = [] # 'M' or 'V' self.crease_angles = [] self.layer_order = [0] # Stack order of regions def apply_fold(self, fold_line: LineString, angle: float, assignment: str) -> dict: """ Apply a fold operation. Returns dict with validity info. fold_line: Shapely LineString defining the fold axis angle: fold angle in degrees (-180 to 180) assignment: 'M' (mountain) or 'V' (valley) """ result = {'valid': True, 'errors': []} # 1. Split regions by fold line new_regions = [] for region in self.regions: if fold_line.intersects(region): parts = split(region, fold_line) new_regions.extend(parts.geoms) else: new_regions.append(region) # 2. Determine which side folds (based on assignment) folding_side = [] staying_side = [] for region in new_regions: centroid = region.centroid side = self._point_side(centroid, fold_line) if side > 0: folding_side.append(region) else: staying_side.append(region) # 3. Reflect folding regions across fold line reflected = [self._reflect_polygon(r, fold_line) for r in folding_side] # 4. Update state self.regions = staying_side + reflected self.crease_lines.append(fold_line) self.crease_assignments.append(assignment) self.crease_angles.append(angle) self.fold_history.append({ 'line': list(fold_line.coords), 'angle': angle, 'assignment': assignment }) # 5. Update layer order self._update_layer_order(staying_side, reflected) return result def _reflect_polygon(self, poly: Polygon, line: LineString) -> Polygon: """Reflect a polygon across a line.""" coords = list(poly.exterior.coords) reflected_coords = [self._reflect_point(p, line) for p in coords] return Polygon(reflected_coords) def _reflect_point(self, point: tuple, line: LineString) -> tuple: """Reflect a point across a line.""" p = np.array(point[:2]) l1 = np.array(line.coords[0]) l2 = np.array(line.coords[1]) d = l2 - l1 d = d / np.linalg.norm(d) # Reflection formula: p' = p - 2(p-l1).n * n where n is normal to line n = np.array([-d[1], d[0]]) v = p - l1 return tuple(p - 2 * np.dot(v, n) * n) def _point_side(self, point, line: LineString) -> float: """Returns positive if point is on left side of line, negative if right.""" p = np.array([point.x, point.y]) l1 = np.array(line.coords[0]) l2 = np.array(line.coords[1]) return float(np.cross(l2 - l1, p - l1)) def _update_layer_order(self, staying, reflected): """Update the layer stacking order after a fold.""" self.layer_order = list(range(len(staying))) + \ list(range(len(staying), len(staying) + len(reflected))) def to_fold_json(self) -> dict: """Export current state as FOLD format JSON.""" vertices = set() for line in self.crease_lines: for coord in line.coords: vertices.add(tuple(round(c, 10) for c in coord)) # Add boundary vertices for region in self.regions: for coord in region.exterior.coords: vertices.add(tuple(round(c, 10) for c in coord[:2])) vertices = sorted(list(vertices)) vertex_map = {v: i for i, v in enumerate(vertices)} edge_set = set() edges_list = [] assignments_list = [] angles_list = [] # Add crease edges for i, line in enumerate(self.crease_lines): c = [tuple(round(x, 10) for x in coord) for coord in line.coords] edge = tuple(sorted([vertex_map[c[0]], vertex_map[c[1]]])) if edge not in edge_set: edge_set.add(edge) edges_list.append(list(edge)) assignments_list.append(self.crease_assignments[i]) angles_list.append(self.crease_angles[i]) return { 'vertices_coords': [list(v) for v in vertices], 'edges_vertices': edges_list, 'edges_assignment': assignments_list, 'edges_foldAngle': angles_list, } class OrigamiVerifier: """Verifiable reward functions based on origami theorems.""" @staticmethod def check_kawasaki(state: PaperState) -> bool: """Kawasaki's theorem: alternating sum of angles at each interior vertex = 0.""" fold_json = state.to_fold_json() vertices = fold_json['vertices_coords'] edges = fold_json['edges_vertices'] for v_idx in range(len(vertices)): v = vertices[v_idx] incident_edges = [e for e in edges if v_idx in e] if len(incident_edges) < 4: continue # Need degree-4+ for Kawasaki # Calculate sector angles angles = [] for e in incident_edges: other = e[1] if e[0] == v_idx else e[0] other_v = vertices[other] angle = np.arctan2(other_v[1] - v[1], other_v[0] - v[0]) angles.append(angle) angles.sort() sector_angles = [] for i in range(len(angles) - 1): sector_angles.append(angles[i+1] - angles[i]) sector_angles.append(2*np.pi - (angles[-1] - angles[0])) # Kawasaki: alternating sum should be ~0 if len(sector_angles) >= 4: alt_sum = sum(sector_angles[::2]) - sum(sector_angles[1::2]) if abs(alt_sum) > 0.01: return False return True @staticmethod def check_maekawa(state: PaperState) -> bool: """Maekawa's theorem: |M - V| = 2 at each interior vertex.""" fold_json = state.to_fold_json() vertices = fold_json['vertices_coords'] edges = fold_json['edges_vertices'] assignments = fold_json['edges_assignment'] for v_idx in range(len(vertices)): incident = [(i, e) for i, e in enumerate(edges) if v_idx in e] m_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'M') v_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'V') if m_count + v_count >= 4: # Interior vertex with folds if abs(m_count - v_count) != 2: return False return True @staticmethod def crease_pattern_similarity(state: PaperState, target_fold_json: dict) -> float: """Compare current crease pattern to target. Returns 0-1 similarity.""" current = state.to_fold_json() n_current = len(current.get('edges_vertices', [])) n_target = len(target_fold_json.get('edges_vertices', [])) if n_target == 0: return 1.0 if n_current == 0 else 0.0 edge_count_sim = 1.0 - abs(n_current - n_target) / max(n_target, 1) edge_count_sim = max(0, edge_count_sim) current_assignments = current.get('edges_assignment', []) target_assignments = target_fold_json.get('edges_assignment', []) c_m = current_assignments.count('M') c_v = current_assignments.count('V') t_m = target_assignments.count('M') t_v = target_assignments.count('V') total = max(t_m + t_v, 1) assign_sim = 1.0 - (abs(c_m - t_m) + abs(c_v - t_v)) / (2 * total) assign_sim = max(0, assign_sim) return 0.5 * edge_count_sim + 0.5 * assign_sim ``` --- ## OpenEnv Environment Wrapper ```python # origami_env/server.py from openenv.core import Environment from paper_engine import PaperState, OrigamiVerifier from shapely.geometry import LineString import json class OrigamiEnvironment(Environment): def __init__(self, targets_dir="targets/", max_steps=20): self.targets_dir = targets_dir self.max_steps = max_steps self.paper = None self.target = None self.step_count = 0 async def reset(self, target_id=None): self.paper = PaperState(size=1.0) self.target = self._load_target(target_id) self.step_count = 0 return self._get_observation() async def step(self, action): self.step_count += 1 # Parse action try: fold_line = LineString(action['fold_line']) angle = action['fold_angle'] assignment = action['assignment'] except (KeyError, Exception): reward = {'format': 0, 'total': -0.1} return self._get_observation(), reward, False, {'error': 'parse_failed'} # Apply fold result = self.paper.apply_fold(fold_line, angle, assignment) # Compute rewards reward = self._compute_reward(result) # Check termination done = ( self.step_count >= self.max_steps or reward.get('completion', 0) > 0 ) return self._get_observation(), reward, done, {} async def state(self): return { 'paper': self.paper.to_fold_json(), 'target': self.target, 'step': self.step_count, 'fold_history': self.paper.fold_history } def _compute_reward(self, fold_result): rewards = {} rewards['format'] = 1.0 kawasaki = OrigamiVerifier.check_kawasaki(self.paper) maekawa = OrigamiVerifier.check_maekawa(self.paper) rewards['validity'] = (float(kawasaki) + float(maekawa)) / 2.0 rewards['progress'] = OrigamiVerifier.crease_pattern_similarity( self.paper, self.target ) if rewards['progress'] > 0.95: rewards['completion'] = 10.0 rewards['efficiency'] = -0.01 rewards['total'] = ( 0.1 * rewards['format'] + 0.2 * rewards['validity'] + 0.6 * rewards['progress'] + rewards.get('completion', 0) + rewards['efficiency'] ) return rewards def _get_observation(self): return { 'paper_state': self.paper.to_fold_json(), 'target': self.target, 'step': self.step_count, 'instruction_history': [str(f['line']) for f in self.paper.fold_history] } def _load_target(self, target_id): if target_id: with open(f"{self.targets_dir}/{target_id}.fold") as f: return json.load(f) # Default: simple valley fold in half return { 'vertices_coords': [[0,0], [1,0], [1,1], [0,1], [0,0.5], [1,0.5]], 'edges_vertices': [[0,1], [1,2], [2,3], [3,0], [4,5]], 'edges_assignment': ['B', 'B', 'B', 'B', 'V'], 'edges_foldAngle': [0, 0, 0, 0, -180], } ``` --- ## Training Script (Unsloth GRPO) ```python # train.py from unsloth import FastLanguageModel from trl import GRPOConfig, GRPOTrainer import torch # Load model model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/Qwen2.5-7B-Instruct", max_seq_length=4096, load_in_4bit=True, ) # Add LoRA model = FastLanguageModel.get_peft_model( model, r=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_alpha=32, lora_dropout=0, use_gradient_checkpointing="unsloth", ) # Reward function def origami_reward(completions, prompts): """Compute rewards for a batch of completions.""" rewards = [] for completion in completions: try: action = parse_fold_action(completion) paper = PaperState() result = paper.apply_fold(action['fold_line'], action['angle'], action['assignment']) r = compute_reward(paper, target) rewards.append(r['total']) except Exception: rewards.append(-0.1) return rewards # GRPO Config config = GRPOConfig( output_dir="origami-grpo", num_train_epochs=3, per_device_train_batch_size=4, gradient_accumulation_steps=4, learning_rate=5e-6, max_completion_length=512, num_generations=8, temperature=1.0, logging_steps=1, ) dataset = load_origami_prompts() trainer = GRPOTrainer( model=model, config=config, train_dataset=dataset, reward_funcs=[origami_reward], tokenizer=tokenizer, ) trainer.train() ``` --- ## Visualization (Demo Only — Not in Training Loop) ### Options 1. **Origami Simulator** — https://github.com/amandaghassaei/OrigamiSimulator — Three.js, accepts FOLD files, shows folding animation with strain visualization 2. **PackCAD** — https://packcad.com/ — Web-based, SVG crease patterns, rigid folding simulation 3. **Custom Three.js** — Simpler but more control ### Demo UI Layout ``` +----------------------+----------------------+ | Instruction Stream | 3D Fold Viewer | | | | | Step 1: Valley fold | [Three.js canvas] | | along center [OK] | | | | Paper animating | | Step 2: Fold top | fold by fold | | corners to center | | | | | +----------------------+----------------------+ | Reward Dashboard | | Format: ========== 1.0 | | Validity: ========.. 0.8 | | Progress: ======.... 0.6 | | Total: =======... 0.72 | | | | [Reward curve over training steps] | +----------------------------------------------+ ``` --- ## Key Libraries and Resources | Tool | Purpose | Link | |------|---------|------| | OpenEnv | Environment framework | https://github.com/meta-pytorch/OpenEnv | | Unsloth | GRPO training | https://github.com/unslothai/unsloth | | OpenPipe ART | Multi-turn RL trainer | https://github.com/OpenPipe/ART | | FOLD format | Origami data structure | https://github.com/edemaine/fold | | Rabbit Ear | JS origami library | https://github.com/rabbit-ear/rabbit-ear | | Origami Simulator | 3D visualization | https://github.com/amandaghassaei/OrigamiSimulator | | PackCAD | Folding simulation | https://packcad.com/ | | Shapely | Python geometry | pip install shapely | | rigid-origami gym | Reference gym env | https://github.com/belalugaX/rigid-origami | ### Papers to Cite - OrigamiSpace: https://arxiv.org/abs/2511.18450 - GamiBench: https://arxiv.org/abs/2512.22207 - SpatialThinker: https://arxiv.org/abs/2511.07403 - Automating Rigid Origami Design: https://arxiv.org/abs/2211.13219 - FOLD format spec: https://github.com/edemaine/fold/blob/main/doc/spec.md --- ## Priority Build Order 1. **Python geometry engine** — PaperState class with fold operations and FOLD export 2. **Verifier functions** — Kawasaki, Maekawa, similarity metrics 3. **OpenEnv wrapper** — step/reset/state API 4. **Simple targets** — Hand-create 5-10 Level 1-2 targets as .fold files 5. **Training script** — Wire up Unsloth GRPO with reward function 6. **Run training** — Even on small model, get reward curves 7. **Three.js visualizer** — For demo only, not in training loop 8. **Before/after demo** — Show base model vs trained model outputs 9. **Polish presentation narrative** --- ## Narrative for Judges **The story arc:** 1. "LLMs are great at text but terrible at spatial reasoning" 2. "Origami is the perfect testbed — it's sequential, physical, and verifiable" 3. "NeurIPS 2025 showed even GPT-5 fails at origami benchmarks, but nobody built a TRAINING environment" 4. "We built OrigamiRL — the first multi-turn RL environment for origami instruction generation" 5. "Our rewards come from math theorems, not vibes — Kawasaki's theorem is our unit test" 6. "Watch the model go from generating paper-tearing nonsense to valid fold sequences" 7. "This generalizes to any domain where LLMs need to output structured physical instructions"