Spaces:

openenv-community
/

optigami

Sleeping

App Files Files Community

ianalin123 commited on Mar 7

Commit

2936d2e

1 Parent(s): fc71686

docs/handoff

Browse files

Files changed (1) hide show

docs/optigami_handoff.md +767 -0

docs/optigami_handoff.md ADDED Viewed

	@@ -0,0 +1,767 @@

+# OrigamiRL — OpenEnv Hackathon Handoff Document
+## TL;DR
+Build the **first multi-turn RL environment where an LLM learns to generate origami folding instructions**, verified by a computational origami simulator. Target the OpenEnv Hackathon (March 7-8, 2026, SF — $100K+ in prizes). Use OpenEnv spec + Unsloth GRPO for training. Dense verifiable rewards from origami geometry theorems (Kawasaki, Maekawa). No learned reward model needed.
+---
+## Hackathon Context
+- **Event:** OpenEnv Hackathon SF, hosted by Cerebral Valley + Shack15 + Meta/PyTorch
+- **Date:** March 7-8, 2026 (happening NOW)
+- **Prize:** $100K+ cash
+- **Teams:** Up to 4 people
+- **Format:** Build RL environments, post-train a base model
+### Judging Criteria
+| Category | Weight | What Matters |
+|----------|--------|-------------|
+| Environment Innovation | 40% | Novel, creative, challenging. Does it meaningfully test agent behavior? |
+| Storytelling | 30% | Clear problem explanation, engaging demo, easy to follow |
+| Training Script Showing Improvement | 20% | Observable reward curves, before/after behavior |
+| Reward and Training Pipeline Setup | 10% | Coherent reward logic, meaningful improvement in inference |
+### Key Sponsors to Impress
+- **Meta/PyTorch** — OpenEnv creators, want environments using their spec
+- **Unsloth AI** — GRPO training infra, ART (Agent Reinforcement Trainer). USE THEIR TOOLS.
+- **OpenPipe** — ART trainer (frontend/backend split for GRPO). Also use.
+- **Patronus AI** — Building "generative simulators" (auto-scaling RL environments). They care about curriculum difficulty scaling and verifiable rewards.
+- **Snorkel AI** — "2026 is the year of environments." They care about data quality and environment diversity.
+- **Hugging Face** — OpenEnv Hub, want environments deployed there
+- **Scale AI / Mercor** — Agent evaluation, structured task environments
+---
+## The Pitch (for judges)
+> "Spatial reasoning is the next frontier for LLM training — NeurIPS 2025 papers like OrigamiSpace showed that even GPT-5 fails at multi-step origami reasoning. But those are benchmarks, not training environments. We built OrigamiRL: the first multi-turn RL environment where an LLM agent learns to fold paper by outputting instructions, receiving geometric feedback, and improving through GRPO. Our reward function is fully verifiable — fold validity is checked against computational origami axioms, not an LLM judge. We built it on OpenEnv + Unsloth with a natural curriculum from single folds to full cranes."
+---
+## Prior Work (What Exists, Where the Gaps Are)
+### 1. OrigamiSpace (NeurIPS 2025 Spotlight)
+- **Paper:** https://arxiv.org/abs/2511.18450
+- **What it is:** Benchmark with 350 origami data instances (CP diagrams, folding processes, folded shapes). 4 evaluation tasks: Pattern Prediction, Multi-step Spatial Reasoning, Spatial Relationship Prediction, End-to-End CP Code Generation.
+- **Their compiler:** Outputs detailed flattened diagrams with crease locations and stacking relationships, supports interactive simulation with MLLMs, provides comprehensive error feedback. Checks: syntax validity, geometric foldability, no self-intersections, Kawasaki's theorem, Maekawa's theorem.
+- **Their reward metrics for code gen:** Hausdorff distance (shape similarity), dihedral angle distribution, bounding box aspect ratios, constraint satisfaction.
+- **Difficulty levels:** Easy (3-9 steps), Medium (10-19 steps), Hard (20-30 steps)
+- **Gap:** Single-turn only (LLM generates complete CP code in one shot). They mention RL exploration but it's not the focus. No multi-turn sequential folding.
+### 2. GamiBench (Dec 2025)
+- **Paper:** https://arxiv.org/abs/2512.22207
+- **What it is:** 186 regular + 186 impossible 2D crease patterns with 3D folded shapes from 6 viewpoints. 3 VQA tasks.
+- **Gap:** Evaluation-only, no training. Tests single-step spatial understanding.
+### 3. SpatialThinker (NeurIPS 2025)
+- **Paper:** https://arxiv.org/abs/2511.07403
+- **What it is:** 3D-aware MLLM trained with RL using dense spatial rewards. Constructs scene graphs. Multi-objective reward with lexicographic gating.
+- **Key architecture to steal:** Dense reward design with lexicographic ordering — format → count → accuracy → spatial. Nearly doubled RL training gains vs sparse rewards. Only needed 7K training samples with GRPO.
+- **Gap:** Static scene understanding (objects on a table), not sequential physical transformations.
+### 4. rigid-origami Gym (IJCAI 2023)
+- **Repo:** https://github.com/belalugaX/rigid-origami
+- **Paper:** "Automating Rigid Origami Design" (https://arxiv.org/abs/2211.13219)
+- **What it is:** Gym environment where agent constructs crease pattern graphs on a board. Sparse rewards. Foldability validated by triangle intersection tests + kinematic rigidity model. Game terminates on non-foldable states.
+- **Gap:** Classical RL agents (discrete grid actions), NOT LLMs generating text. Rigid-origami tessellations only, not traditional origami. No natural language.
+### 5. The Unique Gap We Fill
+Nobody has built a model that reasons about **sequential 2D-to-3D geometric transformations with physical constraints** through **natural language instructions** in a **multi-turn RL training loop**. Origami is uniquely hard because it requires tracking how a flat sheet's topology changes through a sequence of folds — mental rotation, spatial visualization, and perspective-taking all at once.
+---
+## Environment Design
+### Architecture Overview
+```
++---------------------------------------------------+
+|                   OpenEnv Server                   |
+|  +-----------+  +----------+  +--------------+    |
+|  |   State   |  |  Action  |  |   Reward     |    |
+|  | (FOLD JSON|  | (LLM     |  | (Dense,      |    |
+|  |  + target)|  |  output) |  |  verifiable) |    |
+|  +-----------+  +----------+  +--------------+    |
+|         |              |              |            |
+|         v              v              v            |
+|  +-----------------------------------------------+|
+|  |         Paper Geometry Engine (Python)         ||
+|  |  - Polygon state (Shapely)                    ||
+|  |  - Fold operations (reflection across line)   ||
+|  |  - Kawasaki/Maekawa constraint checks         ||
+|  |  - Layer tracking                             ||
+|  |  - FOLD format import/export                  ||
+|  +-----------------------------------------------+|
+|         |                                          |
+|         v                                          |
+|  +-----------------------------------------------+|
+|  |         Three.js Visualizer (Demo only)        ||
+|  |  - 3D fold animation                          ||
+|  |  - Strain heatmap                             ||
+|  |  - Instruction stream                         ||
+|  +-----------------------------------------------+|
++---------------------------------------------------+
+         |                    ^
+         v                    |
++---------------------------------------------------+
+|              Unsloth ART / GRPO Trainer            |
+|  - Qwen2.5-VL-7B or Qwen3-4B base model          |
+|  - LoRA/QLoRA for efficient training              |
+|  - Multi-turn rollouts                            |
++---------------------------------------------------+
+```
+### OpenEnv Spec Compliance
+Must implement these APIs:
+```python
+class OrigamiEnv:
+    async def reset() -> Observation     # New episode: flat paper + target
+    async def step(action) -> (Observation, reward, done, info)
+    async def state() -> State           # Current paper geometry
+    async def close()                    # Cleanup
+```
+OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
+Install: `pip install -e .` then `openenv init origami_env`
+### State Space
+```python
+@dataclass
+class OrigamiState:
+    # Current paper geometry
+    vertices: List[Tuple[float, float]]       # 2D vertex positions
+    edges: List[Tuple[int, int]]              # Edge connectivity
+    edges_assignment: List[str]               # 'M', 'V', 'B', 'F' (mountain/valley/boundary/flat)
+    edges_foldAngle: List[float]              # -180 to 180 degrees
+    faces: List[List[int]]                    # Face vertex indices
+    layer_order: List[List[int]]              # Face stacking order
+    # Episode context
+    target_crease_pattern: dict               # Target FOLD JSON
+    target_shape_image: Optional[np.ndarray]  # Target folded shape (for multimodal)
+    instruction_history: List[str]            # Previous instructions
+    step_count: int
+    max_steps: int
+```
+This maps directly to the **FOLD format** (JSON-based, used by all origami software):
+```json
+{
+  "vertices_coords": [[0,0], [1,0], [1,1], [0,1]],
+  "edges_vertices": [[0,1], [1,2], [2,3], [3,0]],
+  "edges_assignment": ["B", "B", "B", "B"],
+  "edges_foldAngle": [0, 0, 0, 0],
+  "faces_vertices": [[0, 1, 2, 3]]
+}
+```
+FOLD spec: https://github.com/edemaine/fold
+FOLD JS library: https://edemaine.github.io/fold/
+### Action Space
+The LLM outputs a JSON action:
+```json
+{
+  "instruction": "Fold the top edge down to meet the bottom edge",
+  "fold_line": [[0, 0.5], [1, 0.5]],
+  "fold_angle": -180,
+  "assignment": "V"
+}
+```
+The `instruction` field is natural language (what we're training the model to produce well). The geometric fields are the verifiable representation. During training, the model outputs both; for the final demo, the NL instruction is the star.
+Alternative simpler action (for early iterations):
+```json
+{
+  "instruction": "Valley fold along the horizontal center line",
+  "fold_type": "valley",
+  "fold_axis": "horizontal",
+  "fold_position": 0.5
+}
+```
+### Reward Function — Dense, Multi-Objective, Lexicographically Gated
+Inspired by SpatialThinker's design. Rewards are computed in order; later rewards only apply if earlier gates pass.
+```python
+def compute_reward(state, action, new_state, target) -> dict:
+    rewards = {}
+    # LEVEL 1: Format (gate for everything else)
+    # Does the output parse into a valid fold operation?
+    rewards['format'] = 1.0 if parseable(action) else 0.0
+    if rewards['format'] == 0:
+        return rewards  # Stop here
+    # LEVEL 2: Local Geometric Validity
+    # Kawasaki's theorem: sector angles at each interior vertex sum to 2pi
+    kawasaki_valid = check_kawasaki(new_state)
+    # Maekawa's theorem: |M - V| = 2 at each interior vertex
+    maekawa_valid = check_maekawa(new_state)
+    # No self-intersection
+    no_intersection = check_no_self_intersection(new_state)
+    rewards['validity'] = (kawasaki_valid + maekawa_valid + no_intersection) / 3.0
+    if rewards['validity'] < 0.5:
+        return rewards  # Stop here
+    # LEVEL 3: Physical Feasibility
+    # Can this fold actually be performed given layer stack?
+    layer_consistent = check_layer_ordering(new_state)
+    fold_achievable = check_fold_angle_feasible(new_state)
+    rewards['feasibility'] = (layer_consistent + fold_achievable) / 2.0
+    # LEVEL 4: Progress Toward Target (Dense)
+    # Crease pattern graph similarity
+    cp_similarity = crease_pattern_similarity(new_state, target)
+    # Fold angle distribution match
+    angle_similarity = fold_angle_distribution_match(new_state, target)
+    # Bounding box aspect ratio match
+    bbox_similarity = bounding_box_similarity(new_state, target)
+    rewards['progress'] = 0.4 * cp_similarity + 0.4 * angle_similarity + 0.2 * bbox_similarity
+    # LEVEL 5: Completion Bonus
+    if shape_matches_target(new_state, target, tolerance=0.05):
+        rewards['completion'] = 10.0
+    # LEVEL 6: Efficiency
+    rewards['efficiency'] = -0.01  # Small step penalty to encourage fewer folds
+    # Total
+    rewards['total'] = (
+        0.1 * rewards['format'] +
+        0.2 * rewards['validity'] +
+        0.1 * rewards['feasibility'] +
+        0.5 * rewards['progress'] +
+        rewards.get('completion', 0) +
+        rewards['efficiency']
+    )
+    return rewards
+```
+### Key Origami Theorems for Verification
+These are the verifiable constraints — the "unit tests" of origami:
+1. **Kawasaki's Theorem:** At any interior vertex of a flat-foldable crease pattern, the alternating sum of sector angles equals zero (equivalently, they sum to 2pi on each side). NECESSARY condition for flat-foldability.
+2. **Maekawa's Theorem:** At any interior vertex, the number of mountain folds minus valley folds equals +/-2. |M - V| = 2.
+3. **No self-intersection:** Faces cannot penetrate each other during folding.
+4. **Euler's formula for planar graphs:** V - E + F = 2 (sanity check on graph structure).
+5. **Huzita-Hatori axioms:** The 7 axioms defining all possible single-fold operations (point-to-point, point-to-line, line-to-line, etc.). These define the VALID action space.
+### Curriculum Design
+| Level | Folds | Examples | Complexity |
+|-------|-------|----------|-----------|
+| 1 | 1 | Valley fold in half, mountain fold corner | Single fold validity |
+| 2 | 2-3 | Paper airplane nose, triangle fold | Sequential dependency |
+| 3 | 4-6 | Simple boat, fortune teller | Multi-step with symmetry |
+| 4 | 7-12 | Paper airplane (full), jumping frog | Longer horizon planning |
+| 5 | 13-20 | Crane, lily | Complex spatial tracking |
+For the hackathon, focus on Levels 1-3. Even showing reward improvement on Level 1-2 is a strong result.
+---
+## Core Implementation: Python Geometry Engine
+This is the MOST IMPORTANT piece. Pure Python, no JS dependencies.
+```python
+import numpy as np
+from shapely.geometry import Polygon, LineString, MultiPolygon
+from shapely.ops import split
+from typing import List, Tuple, Dict
+import json
+class PaperState:
+    """Represents the current state of the origami paper."""
+    def __init__(self, size: float = 1.0):
+        # Start with a unit square
+        self.regions = [Polygon([(0,0), (size,0), (size,size), (0,size)])]
+        self.fold_history = []
+        self.crease_lines = []
+        self.crease_assignments = []  # 'M' or 'V'
+        self.crease_angles = []
+        self.layer_order = [0]  # Stack order of regions
+    def apply_fold(self, fold_line: LineString, angle: float, assignment: str) -> dict:
+        """
+        Apply a fold operation. Returns dict with validity info.
+        fold_line: Shapely LineString defining the fold axis
+        angle: fold angle in degrees (-180 to 180)
+        assignment: 'M' (mountain) or 'V' (valley)
+        """
+        result = {'valid': True, 'errors': []}
+        # 1. Split regions by fold line
+        new_regions = []
+        for region in self.regions:
+            if fold_line.intersects(region):
+                parts = split(region, fold_line)
+                new_regions.extend(parts.geoms)
+            else:
+                new_regions.append(region)
+        # 2. Determine which side folds (based on assignment)
+        folding_side = []
+        staying_side = []
+        for region in new_regions:
+            centroid = region.centroid
+            side = self._point_side(centroid, fold_line)
+            if side > 0:
+                folding_side.append(region)
+            else:
+                staying_side.append(region)
+        # 3. Reflect folding regions across fold line
+        reflected = [self._reflect_polygon(r, fold_line) for r in folding_side]
+        # 4. Update state
+        self.regions = staying_side + reflected
+        self.crease_lines.append(fold_line)
+        self.crease_assignments.append(assignment)
+        self.crease_angles.append(angle)
+        self.fold_history.append({
+            'line': list(fold_line.coords),
+            'angle': angle,
+            'assignment': assignment
+        })
+        # 5. Update layer order
+        self._update_layer_order(staying_side, reflected)
+        return result
+    def _reflect_polygon(self, poly: Polygon, line: LineString) -> Polygon:
+        """Reflect a polygon across a line."""
+        coords = list(poly.exterior.coords)
+        reflected_coords = [self._reflect_point(p, line) for p in coords]
+        return Polygon(reflected_coords)
+    def _reflect_point(self, point: tuple, line: LineString) -> tuple:
+        """Reflect a point across a line."""
+        p = np.array(point[:2])
+        l1 = np.array(line.coords[0])
+        l2 = np.array(line.coords[1])
+        d = l2 - l1
+        d = d / np.linalg.norm(d)
+        # Reflection formula: p' = p - 2(p-l1).n * n where n is normal to line
+        n = np.array([-d[1], d[0]])
+        v = p - l1
+        return tuple(p - 2 * np.dot(v, n) * n)
+    def _point_side(self, point, line: LineString) -> float:
+        """Returns positive if point is on left side of line, negative if right."""
+        p = np.array([point.x, point.y])
+        l1 = np.array(line.coords[0])
+        l2 = np.array(line.coords[1])
+        return float(np.cross(l2 - l1, p - l1))
+    def _update_layer_order(self, staying, reflected):
+        """Update the layer stacking order after a fold."""
+        self.layer_order = list(range(len(staying))) + \
+                          list(range(len(staying), len(staying) + len(reflected)))
+    def to_fold_json(self) -> dict:
+        """Export current state as FOLD format JSON."""
+        vertices = set()
+        for line in self.crease_lines:
+            for coord in line.coords:
+                vertices.add(tuple(round(c, 10) for c in coord))
+        # Add boundary vertices
+        for region in self.regions:
+            for coord in region.exterior.coords:
+                vertices.add(tuple(round(c, 10) for c in coord[:2]))
+        vertices = sorted(list(vertices))
+        vertex_map = {v: i for i, v in enumerate(vertices)}
+        edge_set = set()
+        edges_list = []
+        assignments_list = []
+        angles_list = []
+        # Add crease edges
+        for i, line in enumerate(self.crease_lines):
+            c = [tuple(round(x, 10) for x in coord) for coord in line.coords]
+            edge = tuple(sorted([vertex_map[c[0]], vertex_map[c[1]]]))
+            if edge not in edge_set:
+                edge_set.add(edge)
+                edges_list.append(list(edge))
+                assignments_list.append(self.crease_assignments[i])
+                angles_list.append(self.crease_angles[i])
+        return {
+            'vertices_coords': [list(v) for v in vertices],
+            'edges_vertices': edges_list,
+            'edges_assignment': assignments_list,
+            'edges_foldAngle': angles_list,
+        }
+class OrigamiVerifier:
+    """Verifiable reward functions based on origami theorems."""
+    @staticmethod
+    def check_kawasaki(state: PaperState) -> bool:
+        """Kawasaki's theorem: alternating sum of angles at each interior vertex = 0."""
+        fold_json = state.to_fold_json()
+        vertices = fold_json['vertices_coords']
+        edges = fold_json['edges_vertices']
+        for v_idx in range(len(vertices)):
+            v = vertices[v_idx]
+            incident_edges = [e for e in edges if v_idx in e]
+            if len(incident_edges) < 4:
+                continue  # Need degree-4+ for Kawasaki
+            # Calculate sector angles
+            angles = []
+            for e in incident_edges:
+                other = e[1] if e[0] == v_idx else e[0]
+                other_v = vertices[other]
+                angle = np.arctan2(other_v[1] - v[1], other_v[0] - v[0])
+                angles.append(angle)
+            angles.sort()
+            sector_angles = []
+            for i in range(len(angles) - 1):
+                sector_angles.append(angles[i+1] - angles[i])
+            sector_angles.append(2*np.pi - (angles[-1] - angles[0]))
+            # Kawasaki: alternating sum should be ~0
+            if len(sector_angles) >= 4:
+                alt_sum = sum(sector_angles[::2]) - sum(sector_angles[1::2])
+                if abs(alt_sum) > 0.01:
+                    return False
+        return True
+    @staticmethod
+    def check_maekawa(state: PaperState) -> bool:
+        """Maekawa's theorem: |M - V| = 2 at each interior vertex."""
+        fold_json = state.to_fold_json()
+        vertices = fold_json['vertices_coords']
+        edges = fold_json['edges_vertices']
+        assignments = fold_json['edges_assignment']
+        for v_idx in range(len(vertices)):
+            incident = [(i, e) for i, e in enumerate(edges) if v_idx in e]
+            m_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'M')
+            v_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'V')
+            if m_count + v_count >= 4:  # Interior vertex with folds
+                if abs(m_count - v_count) != 2:
+                    return False
+        return True
+    @staticmethod
+    def crease_pattern_similarity(state: PaperState, target_fold_json: dict) -> float:
+        """Compare current crease pattern to target. Returns 0-1 similarity."""
+        current = state.to_fold_json()
+        n_current = len(current.get('edges_vertices', []))
+        n_target = len(target_fold_json.get('edges_vertices', []))
+        if n_target == 0:
+            return 1.0 if n_current == 0 else 0.0
+        edge_count_sim = 1.0 - abs(n_current - n_target) / max(n_target, 1)
+        edge_count_sim = max(0, edge_count_sim)
+        current_assignments = current.get('edges_assignment', [])
+        target_assignments = target_fold_json.get('edges_assignment', [])
+        c_m = current_assignments.count('M')
+        c_v = current_assignments.count('V')
+        t_m = target_assignments.count('M')
+        t_v = target_assignments.count('V')
+        total = max(t_m + t_v, 1)
+        assign_sim = 1.0 - (abs(c_m - t_m) + abs(c_v - t_v)) / (2 * total)
+        assign_sim = max(0, assign_sim)
+        return 0.5 * edge_count_sim + 0.5 * assign_sim
+```
+---
+## OpenEnv Environment Wrapper
+```python
+# origami_env/server.py
+from openenv.core import Environment
+from paper_engine import PaperState, OrigamiVerifier
+from shapely.geometry import LineString
+import json
+class OrigamiEnvironment(Environment):
+    def __init__(self, targets_dir="targets/", max_steps=20):
+        self.targets_dir = targets_dir
+        self.max_steps = max_steps
+        self.paper = None
+        self.target = None
+        self.step_count = 0
+    async def reset(self, target_id=None):
+        self.paper = PaperState(size=1.0)
+        self.target = self._load_target(target_id)
+        self.step_count = 0
+        return self._get_observation()
+    async def step(self, action):
+        self.step_count += 1
+        # Parse action
+        try:
+            fold_line = LineString(action['fold_line'])
+            angle = action['fold_angle']
+            assignment = action['assignment']
+        except (KeyError, Exception):
+            reward = {'format': 0, 'total': -0.1}
+            return self._get_observation(), reward, False, {'error': 'parse_failed'}
+        # Apply fold
+        result = self.paper.apply_fold(fold_line, angle, assignment)
+        # Compute rewards
+        reward = self._compute_reward(result)
+        # Check termination
+        done = (
+            self.step_count >= self.max_steps or
+            reward.get('completion', 0) > 0
+        )
+        return self._get_observation(), reward, done, {}
+    async def state(self):
+        return {
+            'paper': self.paper.to_fold_json(),
+            'target': self.target,
+            'step': self.step_count,
+            'fold_history': self.paper.fold_history
+        }
+    def _compute_reward(self, fold_result):
+        rewards = {}
+        rewards['format'] = 1.0
+        kawasaki = OrigamiVerifier.check_kawasaki(self.paper)
+        maekawa = OrigamiVerifier.check_maekawa(self.paper)
+        rewards['validity'] = (float(kawasaki) + float(maekawa)) / 2.0
+        rewards['progress'] = OrigamiVerifier.crease_pattern_similarity(
+            self.paper, self.target
+        )
+        if rewards['progress'] > 0.95:
+            rewards['completion'] = 10.0
+        rewards['efficiency'] = -0.01
+        rewards['total'] = (
+            0.1 * rewards['format'] +
+            0.2 * rewards['validity'] +
+            0.6 * rewards['progress'] +
+            rewards.get('completion', 0) +
+            rewards['efficiency']
+        )
+        return rewards
+    def _get_observation(self):
+        return {
+            'paper_state': self.paper.to_fold_json(),
+            'target': self.target,
+            'step': self.step_count,
+            'instruction_history': [str(f['line']) for f in self.paper.fold_history]
+        }
+    def _load_target(self, target_id):
+        if target_id:
+            with open(f"{self.targets_dir}/{target_id}.fold") as f:
+                return json.load(f)
+        # Default: simple valley fold in half
+        return {
+            'vertices_coords': [[0,0], [1,0], [1,1], [0,1], [0,0.5], [1,0.5]],
+            'edges_vertices': [[0,1], [1,2], [2,3], [3,0], [4,5]],
+            'edges_assignment': ['B', 'B', 'B', 'B', 'V'],
+            'edges_foldAngle': [0, 0, 0, 0, -180],
+        }
+```
+---
+## Training Script (Unsloth GRPO)
+```python
+# train.py
+from unsloth import FastLanguageModel
+from trl import GRPOConfig, GRPOTrainer
+import torch
+# Load model
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="unsloth/Qwen2.5-7B-Instruct",
+    max_seq_length=4096,
+    load_in_4bit=True,
+)
+# Add LoRA
+model = FastLanguageModel.get_peft_model(
+    model,
+    r=32,
+    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
+                     "gate_proj", "up_proj", "down_proj"],
+    lora_alpha=32,
+    lora_dropout=0,
+    use_gradient_checkpointing="unsloth",
+)
+# Reward function
+def origami_reward(completions, prompts):
+    """Compute rewards for a batch of completions."""
+    rewards = []
+    for completion in completions:
+        try:
+            action = parse_fold_action(completion)
+            paper = PaperState()
+            result = paper.apply_fold(action['fold_line'], action['angle'], action['assignment'])
+            r = compute_reward(paper, target)
+            rewards.append(r['total'])
+        except Exception:
+            rewards.append(-0.1)
+    return rewards
+# GRPO Config
+config = GRPOConfig(
+    output_dir="origami-grpo",
+    num_train_epochs=3,
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=4,
+    learning_rate=5e-6,
+    max_completion_length=512,
+    num_generations=8,
+    temperature=1.0,
+    logging_steps=1,
+)
+dataset = load_origami_prompts()
+trainer = GRPOTrainer(
+    model=model,
+    config=config,
+    train_dataset=dataset,
+    reward_funcs=[origami_reward],
+    tokenizer=tokenizer,
+)
+trainer.train()
+```
+---
+## Visualization (Demo Only — Not in Training Loop)
+### Options
+1. **Origami Simulator** — https://github.com/amandaghassaei/OrigamiSimulator — Three.js, accepts FOLD files, shows folding animation with strain visualization
+2. **PackCAD** — https://packcad.com/ — Web-based, SVG crease patterns, rigid folding simulation
+3. **Custom Three.js** — Simpler but more control
+### Demo UI Layout
+```
++----------------------+----------------------+
+|   Instruction Stream |   3D Fold Viewer     |
+|                      |                      |
+| Step 1: Valley fold  |   [Three.js canvas]  |
+| along center [OK]    |                      |
+|                      |   Paper animating    |
+| Step 2: Fold top     |   fold by fold       |
+| corners to center    |                      |
+|                      |                      |
++----------------------+----------------------+
+|   Reward Dashboard                          |
+|   Format:   ========== 1.0                  |
+|   Validity: ========.. 0.8                  |
+|   Progress: ======.... 0.6                  |
+|   Total:    =======... 0.72                 |
+|                                              |
+|   [Reward curve over training steps]         |
++----------------------------------------------+
+```
+---
+## Key Libraries and Resources
+| Tool | Purpose | Link |
+|------|---------|------|
+| OpenEnv | Environment framework | https://github.com/meta-pytorch/OpenEnv |
+| Unsloth | GRPO training | https://github.com/unslothai/unsloth |
+| OpenPipe ART | Multi-turn RL trainer | https://github.com/OpenPipe/ART |
+| FOLD format | Origami data structure | https://github.com/edemaine/fold |
+| Rabbit Ear | JS origami library | https://github.com/rabbit-ear/rabbit-ear |
+| Origami Simulator | 3D visualization | https://github.com/amandaghassaei/OrigamiSimulator |
+| PackCAD | Folding simulation | https://packcad.com/ |
+| Shapely | Python geometry | pip install shapely |
+| rigid-origami gym | Reference gym env | https://github.com/belalugaX/rigid-origami |
+### Papers to Cite
+- OrigamiSpace: https://arxiv.org/abs/2511.18450
+- GamiBench: https://arxiv.org/abs/2512.22207
+- SpatialThinker: https://arxiv.org/abs/2511.07403
+- Automating Rigid Origami Design: https://arxiv.org/abs/2211.13219
+- FOLD format spec: https://github.com/edemaine/fold/blob/main/doc/spec.md
+---
+## Priority Build Order
+1. **Python geometry engine** — PaperState class with fold operations and FOLD export
+2. **Verifier functions** — Kawasaki, Maekawa, similarity metrics
+3. **OpenEnv wrapper** — step/reset/state API
+4. **Simple targets** — Hand-create 5-10 Level 1-2 targets as .fold files
+5. **Training script** — Wire up Unsloth GRPO with reward function
+6. **Run training** — Even on small model, get reward curves
+7. **Three.js visualizer** — For demo only, not in training loop
+8. **Before/after demo** — Show base model vs trained model outputs
+9. **Polish presentation narrative**
+---
+## Narrative for Judges
+**The story arc:**
+1. "LLMs are great at text but terrible at spatial reasoning"
+2. "Origami is the perfect testbed — it's sequential, physical, and verifiable"
+3. "NeurIPS 2025 showed even GPT-5 fails at origami benchmarks, but nobody built a TRAINING environment"
+4. "We built OrigamiRL — the first multi-turn RL environment for origami instruction generation"
+5. "Our rewards come from math theorems, not vibes — Kawasaki's theorem is our unit test"
+6. "Watch the model go from generating paper-tearing nonsense to valid fold sequences"
+7. "This generalizes to any domain where LLMs need to output structured physical instructions"