optigami / docs /optigami_handoff.md
sissississi's picture
iana (#1)
19abe39
|
raw
history blame
30.2 kB
# OrigamiRL β€” OpenEnv Hackathon Handoff Document
## TL;DR
Build the **first multi-turn RL environment where an LLM learns to generate origami folding instructions**, verified by a computational origami simulator. Target the OpenEnv Hackathon (March 7-8, 2026, SF β€” $100K+ in prizes). Use OpenEnv spec + Unsloth GRPO for training. Dense verifiable rewards from origami geometry theorems (Kawasaki, Maekawa). No learned reward model needed.
---
## Hackathon Context
- **Event:** OpenEnv Hackathon SF, hosted by Cerebral Valley + Shack15 + Meta/PyTorch
- **Date:** March 7-8, 2026 (happening NOW)
- **Prize:** $100K+ cash
- **Teams:** Up to 4 people
- **Format:** Build RL environments, post-train a base model
### Judging Criteria
| Category | Weight | What Matters |
|----------|--------|-------------|
| Environment Innovation | 40% | Novel, creative, challenging. Does it meaningfully test agent behavior? |
| Storytelling | 30% | Clear problem explanation, engaging demo, easy to follow |
| Training Script Showing Improvement | 20% | Observable reward curves, before/after behavior |
| Reward and Training Pipeline Setup | 10% | Coherent reward logic, meaningful improvement in inference |
### Key Sponsors to Impress
- **Meta/PyTorch** β€” OpenEnv creators, want environments using their spec
- **Unsloth AI** β€” GRPO training infra, ART (Agent Reinforcement Trainer). USE THEIR TOOLS.
- **OpenPipe** β€” ART trainer (frontend/backend split for GRPO). Also use.
- **Patronus AI** β€” Building "generative simulators" (auto-scaling RL environments). They care about curriculum difficulty scaling and verifiable rewards.
- **Snorkel AI** β€” "2026 is the year of environments." They care about data quality and environment diversity.
- **Hugging Face** β€” OpenEnv Hub, want environments deployed there
- **Scale AI / Mercor** β€” Agent evaluation, structured task environments
---
## The Pitch (for judges)
> "Spatial reasoning is the next frontier for LLM training β€” NeurIPS 2025 papers like OrigamiSpace showed that even GPT-5 fails at multi-step origami reasoning. But those are benchmarks, not training environments. We built OrigamiRL: the first multi-turn RL environment where an LLM agent learns to fold paper by outputting instructions, receiving geometric feedback, and improving through GRPO. Our reward function is fully verifiable β€” fold validity is checked against computational origami axioms, not an LLM judge. We built it on OpenEnv + Unsloth with a natural curriculum from single folds to full cranes."
---
## Prior Work (What Exists, Where the Gaps Are)
### 1. OrigamiSpace (NeurIPS 2025 Spotlight)
- **Paper:** https://arxiv.org/abs/2511.18450
- **What it is:** Benchmark with 350 origami data instances (CP diagrams, folding processes, folded shapes). 4 evaluation tasks: Pattern Prediction, Multi-step Spatial Reasoning, Spatial Relationship Prediction, End-to-End CP Code Generation.
- **Their compiler:** Outputs detailed flattened diagrams with crease locations and stacking relationships, supports interactive simulation with MLLMs, provides comprehensive error feedback. Checks: syntax validity, geometric foldability, no self-intersections, Kawasaki's theorem, Maekawa's theorem.
- **Their reward metrics for code gen:** Hausdorff distance (shape similarity), dihedral angle distribution, bounding box aspect ratios, constraint satisfaction.
- **Difficulty levels:** Easy (3-9 steps), Medium (10-19 steps), Hard (20-30 steps)
- **Gap:** Single-turn only (LLM generates complete CP code in one shot). They mention RL exploration but it's not the focus. No multi-turn sequential folding.
### 2. GamiBench (Dec 2025)
- **Paper:** https://arxiv.org/abs/2512.22207
- **What it is:** 186 regular + 186 impossible 2D crease patterns with 3D folded shapes from 6 viewpoints. 3 VQA tasks.
- **Gap:** Evaluation-only, no training. Tests single-step spatial understanding.
### 3. SpatialThinker (NeurIPS 2025)
- **Paper:** https://arxiv.org/abs/2511.07403
- **What it is:** 3D-aware MLLM trained with RL using dense spatial rewards. Constructs scene graphs. Multi-objective reward with lexicographic gating.
- **Key architecture to steal:** Dense reward design with lexicographic ordering β€” format β†’ count β†’ accuracy β†’ spatial. Nearly doubled RL training gains vs sparse rewards. Only needed 7K training samples with GRPO.
- **Gap:** Static scene understanding (objects on a table), not sequential physical transformations.
### 4. rigid-origami Gym (IJCAI 2023)
- **Repo:** https://github.com/belalugaX/rigid-origami
- **Paper:** "Automating Rigid Origami Design" (https://arxiv.org/abs/2211.13219)
- **What it is:** Gym environment where agent constructs crease pattern graphs on a board. Sparse rewards. Foldability validated by triangle intersection tests + kinematic rigidity model. Game terminates on non-foldable states.
- **Gap:** Classical RL agents (discrete grid actions), NOT LLMs generating text. Rigid-origami tessellations only, not traditional origami. No natural language.
### 5. The Unique Gap We Fill
Nobody has built a model that reasons about **sequential 2D-to-3D geometric transformations with physical constraints** through **natural language instructions** in a **multi-turn RL training loop**. Origami is uniquely hard because it requires tracking how a flat sheet's topology changes through a sequence of folds β€” mental rotation, spatial visualization, and perspective-taking all at once.
---
## Environment Design
### Architecture Overview
```
+---------------------------------------------------+
| OpenEnv Server |
| +-----------+ +----------+ +--------------+ |
| | State | | Action | | Reward | |
| | (FOLD JSON| | (LLM | | (Dense, | |
| | + target)| | output) | | verifiable) | |
| +-----------+ +----------+ +--------------+ |
| | | | |
| v v v |
| +-----------------------------------------------+|
| | Paper Geometry Engine (Python) ||
| | - Polygon state (Shapely) ||
| | - Fold operations (reflection across line) ||
| | - Kawasaki/Maekawa constraint checks ||
| | - Layer tracking ||
| | - FOLD format import/export ||
| +-----------------------------------------------+|
| | |
| v |
| +-----------------------------------------------+|
| | Three.js Visualizer (Demo only) ||
| | - 3D fold animation ||
| | - Strain heatmap ||
| | - Instruction stream ||
| +-----------------------------------------------+|
+---------------------------------------------------+
| ^
v |
+---------------------------------------------------+
| Unsloth ART / GRPO Trainer |
| - Qwen2.5-VL-7B or Qwen3-4B base model |
| - LoRA/QLoRA for efficient training |
| - Multi-turn rollouts |
+---------------------------------------------------+
```
### OpenEnv Spec Compliance
Must implement these APIs:
```python
class OrigamiEnv:
async def reset() -> Observation # New episode: flat paper + target
async def step(action) -> (Observation, reward, done, info)
async def state() -> State # Current paper geometry
async def close() # Cleanup
```
OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
Install: `pip install -e .` then `openenv init origami_env`
### State Space
```python
@dataclass
class OrigamiState:
# Current paper geometry
vertices: List[Tuple[float, float]] # 2D vertex positions
edges: List[Tuple[int, int]] # Edge connectivity
edges_assignment: List[str] # 'M', 'V', 'B', 'F' (mountain/valley/boundary/flat)
edges_foldAngle: List[float] # -180 to 180 degrees
faces: List[List[int]] # Face vertex indices
layer_order: List[List[int]] # Face stacking order
# Episode context
target_crease_pattern: dict # Target FOLD JSON
target_shape_image: Optional[np.ndarray] # Target folded shape (for multimodal)
instruction_history: List[str] # Previous instructions
step_count: int
max_steps: int
```
This maps directly to the **FOLD format** (JSON-based, used by all origami software):
```json
{
"vertices_coords": [[0,0], [1,0], [1,1], [0,1]],
"edges_vertices": [[0,1], [1,2], [2,3], [3,0]],
"edges_assignment": ["B", "B", "B", "B"],
"edges_foldAngle": [0, 0, 0, 0],
"faces_vertices": [[0, 1, 2, 3]]
}
```
FOLD spec: https://github.com/edemaine/fold
FOLD JS library: https://edemaine.github.io/fold/
### Action Space
The LLM outputs a JSON action:
```json
{
"instruction": "Fold the top edge down to meet the bottom edge",
"fold_line": [[0, 0.5], [1, 0.5]],
"fold_angle": -180,
"assignment": "V"
}
```
The `instruction` field is natural language (what we're training the model to produce well). The geometric fields are the verifiable representation. During training, the model outputs both; for the final demo, the NL instruction is the star.
Alternative simpler action (for early iterations):
```json
{
"instruction": "Valley fold along the horizontal center line",
"fold_type": "valley",
"fold_axis": "horizontal",
"fold_position": 0.5
}
```
### Reward Function β€” Dense, Multi-Objective, Lexicographically Gated
Inspired by SpatialThinker's design. Rewards are computed in order; later rewards only apply if earlier gates pass.
```python
def compute_reward(state, action, new_state, target) -> dict:
rewards = {}
# LEVEL 1: Format (gate for everything else)
# Does the output parse into a valid fold operation?
rewards['format'] = 1.0 if parseable(action) else 0.0
if rewards['format'] == 0:
return rewards # Stop here
# LEVEL 2: Local Geometric Validity
# Kawasaki's theorem: sector angles at each interior vertex sum to 2pi
kawasaki_valid = check_kawasaki(new_state)
# Maekawa's theorem: |M - V| = 2 at each interior vertex
maekawa_valid = check_maekawa(new_state)
# No self-intersection
no_intersection = check_no_self_intersection(new_state)
rewards['validity'] = (kawasaki_valid + maekawa_valid + no_intersection) / 3.0
if rewards['validity'] < 0.5:
return rewards # Stop here
# LEVEL 3: Physical Feasibility
# Can this fold actually be performed given layer stack?
layer_consistent = check_layer_ordering(new_state)
fold_achievable = check_fold_angle_feasible(new_state)
rewards['feasibility'] = (layer_consistent + fold_achievable) / 2.0
# LEVEL 4: Progress Toward Target (Dense)
# Crease pattern graph similarity
cp_similarity = crease_pattern_similarity(new_state, target)
# Fold angle distribution match
angle_similarity = fold_angle_distribution_match(new_state, target)
# Bounding box aspect ratio match
bbox_similarity = bounding_box_similarity(new_state, target)
rewards['progress'] = 0.4 * cp_similarity + 0.4 * angle_similarity + 0.2 * bbox_similarity
# LEVEL 5: Completion Bonus
if shape_matches_target(new_state, target, tolerance=0.05):
rewards['completion'] = 10.0
# LEVEL 6: Efficiency
rewards['efficiency'] = -0.01 # Small step penalty to encourage fewer folds
# Total
rewards['total'] = (
0.1 * rewards['format'] +
0.2 * rewards['validity'] +
0.1 * rewards['feasibility'] +
0.5 * rewards['progress'] +
rewards.get('completion', 0) +
rewards['efficiency']
)
return rewards
```
### Key Origami Theorems for Verification
These are the verifiable constraints β€” the "unit tests" of origami:
1. **Kawasaki's Theorem:** At any interior vertex of a flat-foldable crease pattern, the alternating sum of sector angles equals zero (equivalently, they sum to 2pi on each side). NECESSARY condition for flat-foldability.
2. **Maekawa's Theorem:** At any interior vertex, the number of mountain folds minus valley folds equals +/-2. |M - V| = 2.
3. **No self-intersection:** Faces cannot penetrate each other during folding.
4. **Euler's formula for planar graphs:** V - E + F = 2 (sanity check on graph structure).
5. **Huzita-Hatori axioms:** The 7 axioms defining all possible single-fold operations (point-to-point, point-to-line, line-to-line, etc.). These define the VALID action space.
### Curriculum Design
| Level | Folds | Examples | Complexity |
|-------|-------|----------|-----------|
| 1 | 1 | Valley fold in half, mountain fold corner | Single fold validity |
| 2 | 2-3 | Paper airplane nose, triangle fold | Sequential dependency |
| 3 | 4-6 | Simple boat, fortune teller | Multi-step with symmetry |
| 4 | 7-12 | Paper airplane (full), jumping frog | Longer horizon planning |
| 5 | 13-20 | Crane, lily | Complex spatial tracking |
For the hackathon, focus on Levels 1-3. Even showing reward improvement on Level 1-2 is a strong result.
---
## Core Implementation: Python Geometry Engine
This is the MOST IMPORTANT piece. Pure Python, no JS dependencies.
```python
import numpy as np
from shapely.geometry import Polygon, LineString, MultiPolygon
from shapely.ops import split
from typing import List, Tuple, Dict
import json
class PaperState:
"""Represents the current state of the origami paper."""
def __init__(self, size: float = 1.0):
# Start with a unit square
self.regions = [Polygon([(0,0), (size,0), (size,size), (0,size)])]
self.fold_history = []
self.crease_lines = []
self.crease_assignments = [] # 'M' or 'V'
self.crease_angles = []
self.layer_order = [0] # Stack order of regions
def apply_fold(self, fold_line: LineString, angle: float, assignment: str) -> dict:
"""
Apply a fold operation. Returns dict with validity info.
fold_line: Shapely LineString defining the fold axis
angle: fold angle in degrees (-180 to 180)
assignment: 'M' (mountain) or 'V' (valley)
"""
result = {'valid': True, 'errors': []}
# 1. Split regions by fold line
new_regions = []
for region in self.regions:
if fold_line.intersects(region):
parts = split(region, fold_line)
new_regions.extend(parts.geoms)
else:
new_regions.append(region)
# 2. Determine which side folds (based on assignment)
folding_side = []
staying_side = []
for region in new_regions:
centroid = region.centroid
side = self._point_side(centroid, fold_line)
if side > 0:
folding_side.append(region)
else:
staying_side.append(region)
# 3. Reflect folding regions across fold line
reflected = [self._reflect_polygon(r, fold_line) for r in folding_side]
# 4. Update state
self.regions = staying_side + reflected
self.crease_lines.append(fold_line)
self.crease_assignments.append(assignment)
self.crease_angles.append(angle)
self.fold_history.append({
'line': list(fold_line.coords),
'angle': angle,
'assignment': assignment
})
# 5. Update layer order
self._update_layer_order(staying_side, reflected)
return result
def _reflect_polygon(self, poly: Polygon, line: LineString) -> Polygon:
"""Reflect a polygon across a line."""
coords = list(poly.exterior.coords)
reflected_coords = [self._reflect_point(p, line) for p in coords]
return Polygon(reflected_coords)
def _reflect_point(self, point: tuple, line: LineString) -> tuple:
"""Reflect a point across a line."""
p = np.array(point[:2])
l1 = np.array(line.coords[0])
l2 = np.array(line.coords[1])
d = l2 - l1
d = d / np.linalg.norm(d)
# Reflection formula: p' = p - 2(p-l1).n * n where n is normal to line
n = np.array([-d[1], d[0]])
v = p - l1
return tuple(p - 2 * np.dot(v, n) * n)
def _point_side(self, point, line: LineString) -> float:
"""Returns positive if point is on left side of line, negative if right."""
p = np.array([point.x, point.y])
l1 = np.array(line.coords[0])
l2 = np.array(line.coords[1])
return float(np.cross(l2 - l1, p - l1))
def _update_layer_order(self, staying, reflected):
"""Update the layer stacking order after a fold."""
self.layer_order = list(range(len(staying))) + \
list(range(len(staying), len(staying) + len(reflected)))
def to_fold_json(self) -> dict:
"""Export current state as FOLD format JSON."""
vertices = set()
for line in self.crease_lines:
for coord in line.coords:
vertices.add(tuple(round(c, 10) for c in coord))
# Add boundary vertices
for region in self.regions:
for coord in region.exterior.coords:
vertices.add(tuple(round(c, 10) for c in coord[:2]))
vertices = sorted(list(vertices))
vertex_map = {v: i for i, v in enumerate(vertices)}
edge_set = set()
edges_list = []
assignments_list = []
angles_list = []
# Add crease edges
for i, line in enumerate(self.crease_lines):
c = [tuple(round(x, 10) for x in coord) for coord in line.coords]
edge = tuple(sorted([vertex_map[c[0]], vertex_map[c[1]]]))
if edge not in edge_set:
edge_set.add(edge)
edges_list.append(list(edge))
assignments_list.append(self.crease_assignments[i])
angles_list.append(self.crease_angles[i])
return {
'vertices_coords': [list(v) for v in vertices],
'edges_vertices': edges_list,
'edges_assignment': assignments_list,
'edges_foldAngle': angles_list,
}
class OrigamiVerifier:
"""Verifiable reward functions based on origami theorems."""
@staticmethod
def check_kawasaki(state: PaperState) -> bool:
"""Kawasaki's theorem: alternating sum of angles at each interior vertex = 0."""
fold_json = state.to_fold_json()
vertices = fold_json['vertices_coords']
edges = fold_json['edges_vertices']
for v_idx in range(len(vertices)):
v = vertices[v_idx]
incident_edges = [e for e in edges if v_idx in e]
if len(incident_edges) < 4:
continue # Need degree-4+ for Kawasaki
# Calculate sector angles
angles = []
for e in incident_edges:
other = e[1] if e[0] == v_idx else e[0]
other_v = vertices[other]
angle = np.arctan2(other_v[1] - v[1], other_v[0] - v[0])
angles.append(angle)
angles.sort()
sector_angles = []
for i in range(len(angles) - 1):
sector_angles.append(angles[i+1] - angles[i])
sector_angles.append(2*np.pi - (angles[-1] - angles[0]))
# Kawasaki: alternating sum should be ~0
if len(sector_angles) >= 4:
alt_sum = sum(sector_angles[::2]) - sum(sector_angles[1::2])
if abs(alt_sum) > 0.01:
return False
return True
@staticmethod
def check_maekawa(state: PaperState) -> bool:
"""Maekawa's theorem: |M - V| = 2 at each interior vertex."""
fold_json = state.to_fold_json()
vertices = fold_json['vertices_coords']
edges = fold_json['edges_vertices']
assignments = fold_json['edges_assignment']
for v_idx in range(len(vertices)):
incident = [(i, e) for i, e in enumerate(edges) if v_idx in e]
m_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'M')
v_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'V')
if m_count + v_count >= 4: # Interior vertex with folds
if abs(m_count - v_count) != 2:
return False
return True
@staticmethod
def crease_pattern_similarity(state: PaperState, target_fold_json: dict) -> float:
"""Compare current crease pattern to target. Returns 0-1 similarity."""
current = state.to_fold_json()
n_current = len(current.get('edges_vertices', []))
n_target = len(target_fold_json.get('edges_vertices', []))
if n_target == 0:
return 1.0 if n_current == 0 else 0.0
edge_count_sim = 1.0 - abs(n_current - n_target) / max(n_target, 1)
edge_count_sim = max(0, edge_count_sim)
current_assignments = current.get('edges_assignment', [])
target_assignments = target_fold_json.get('edges_assignment', [])
c_m = current_assignments.count('M')
c_v = current_assignments.count('V')
t_m = target_assignments.count('M')
t_v = target_assignments.count('V')
total = max(t_m + t_v, 1)
assign_sim = 1.0 - (abs(c_m - t_m) + abs(c_v - t_v)) / (2 * total)
assign_sim = max(0, assign_sim)
return 0.5 * edge_count_sim + 0.5 * assign_sim
```
---
## OpenEnv Environment Wrapper
```python
# origami_env/server.py
from openenv.core import Environment
from paper_engine import PaperState, OrigamiVerifier
from shapely.geometry import LineString
import json
class OrigamiEnvironment(Environment):
def __init__(self, targets_dir="targets/", max_steps=20):
self.targets_dir = targets_dir
self.max_steps = max_steps
self.paper = None
self.target = None
self.step_count = 0
async def reset(self, target_id=None):
self.paper = PaperState(size=1.0)
self.target = self._load_target(target_id)
self.step_count = 0
return self._get_observation()
async def step(self, action):
self.step_count += 1
# Parse action
try:
fold_line = LineString(action['fold_line'])
angle = action['fold_angle']
assignment = action['assignment']
except (KeyError, Exception):
reward = {'format': 0, 'total': -0.1}
return self._get_observation(), reward, False, {'error': 'parse_failed'}
# Apply fold
result = self.paper.apply_fold(fold_line, angle, assignment)
# Compute rewards
reward = self._compute_reward(result)
# Check termination
done = (
self.step_count >= self.max_steps or
reward.get('completion', 0) > 0
)
return self._get_observation(), reward, done, {}
async def state(self):
return {
'paper': self.paper.to_fold_json(),
'target': self.target,
'step': self.step_count,
'fold_history': self.paper.fold_history
}
def _compute_reward(self, fold_result):
rewards = {}
rewards['format'] = 1.0
kawasaki = OrigamiVerifier.check_kawasaki(self.paper)
maekawa = OrigamiVerifier.check_maekawa(self.paper)
rewards['validity'] = (float(kawasaki) + float(maekawa)) / 2.0
rewards['progress'] = OrigamiVerifier.crease_pattern_similarity(
self.paper, self.target
)
if rewards['progress'] > 0.95:
rewards['completion'] = 10.0
rewards['efficiency'] = -0.01
rewards['total'] = (
0.1 * rewards['format'] +
0.2 * rewards['validity'] +
0.6 * rewards['progress'] +
rewards.get('completion', 0) +
rewards['efficiency']
)
return rewards
def _get_observation(self):
return {
'paper_state': self.paper.to_fold_json(),
'target': self.target,
'step': self.step_count,
'instruction_history': [str(f['line']) for f in self.paper.fold_history]
}
def _load_target(self, target_id):
if target_id:
with open(f"{self.targets_dir}/{target_id}.fold") as f:
return json.load(f)
# Default: simple valley fold in half
return {
'vertices_coords': [[0,0], [1,0], [1,1], [0,1], [0,0.5], [1,0.5]],
'edges_vertices': [[0,1], [1,2], [2,3], [3,0], [4,5]],
'edges_assignment': ['B', 'B', 'B', 'B', 'V'],
'edges_foldAngle': [0, 0, 0, 0, -180],
}
```
---
## Training Script (Unsloth GRPO)
```python
# train.py
from unsloth import FastLanguageModel
from trl import GRPOConfig, GRPOTrainer
import torch
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Qwen2.5-7B-Instruct",
max_seq_length=4096,
load_in_4bit=True,
)
# Add LoRA
model = FastLanguageModel.get_peft_model(
model,
r=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=32,
lora_dropout=0,
use_gradient_checkpointing="unsloth",
)
# Reward function
def origami_reward(completions, prompts):
"""Compute rewards for a batch of completions."""
rewards = []
for completion in completions:
try:
action = parse_fold_action(completion)
paper = PaperState()
result = paper.apply_fold(action['fold_line'], action['angle'], action['assignment'])
r = compute_reward(paper, target)
rewards.append(r['total'])
except Exception:
rewards.append(-0.1)
return rewards
# GRPO Config
config = GRPOConfig(
output_dir="origami-grpo",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=5e-6,
max_completion_length=512,
num_generations=8,
temperature=1.0,
logging_steps=1,
)
dataset = load_origami_prompts()
trainer = GRPOTrainer(
model=model,
config=config,
train_dataset=dataset,
reward_funcs=[origami_reward],
tokenizer=tokenizer,
)
trainer.train()
```
---
## Visualization (Demo Only β€” Not in Training Loop)
### Options
1. **Origami Simulator** β€” https://github.com/amandaghassaei/OrigamiSimulator β€” Three.js, accepts FOLD files, shows folding animation with strain visualization
2. **PackCAD** β€” https://packcad.com/ β€” Web-based, SVG crease patterns, rigid folding simulation
3. **Custom Three.js** β€” Simpler but more control
### Demo UI Layout
```
+----------------------+----------------------+
| Instruction Stream | 3D Fold Viewer |
| | |
| Step 1: Valley fold | [Three.js canvas] |
| along center [OK] | |
| | Paper animating |
| Step 2: Fold top | fold by fold |
| corners to center | |
| | |
+----------------------+----------------------+
| Reward Dashboard |
| Format: ========== 1.0 |
| Validity: ========.. 0.8 |
| Progress: ======.... 0.6 |
| Total: =======... 0.72 |
| |
| [Reward curve over training steps] |
+----------------------------------------------+
```
---
## Key Libraries and Resources
| Tool | Purpose | Link |
|------|---------|------|
| OpenEnv | Environment framework | https://github.com/meta-pytorch/OpenEnv |
| Unsloth | GRPO training | https://github.com/unslothai/unsloth |
| OpenPipe ART | Multi-turn RL trainer | https://github.com/OpenPipe/ART |
| FOLD format | Origami data structure | https://github.com/edemaine/fold |
| Rabbit Ear | JS origami library | https://github.com/rabbit-ear/rabbit-ear |
| Origami Simulator | 3D visualization | https://github.com/amandaghassaei/OrigamiSimulator |
| PackCAD | Folding simulation | https://packcad.com/ |
| Shapely | Python geometry | pip install shapely |
| rigid-origami gym | Reference gym env | https://github.com/belalugaX/rigid-origami |
### Papers to Cite
- OrigamiSpace: https://arxiv.org/abs/2511.18450
- GamiBench: https://arxiv.org/abs/2512.22207
- SpatialThinker: https://arxiv.org/abs/2511.07403
- Automating Rigid Origami Design: https://arxiv.org/abs/2211.13219
- FOLD format spec: https://github.com/edemaine/fold/blob/main/doc/spec.md
---
## Priority Build Order
1. **Python geometry engine** β€” PaperState class with fold operations and FOLD export
2. **Verifier functions** β€” Kawasaki, Maekawa, similarity metrics
3. **OpenEnv wrapper** β€” step/reset/state API
4. **Simple targets** β€” Hand-create 5-10 Level 1-2 targets as .fold files
5. **Training script** β€” Wire up Unsloth GRPO with reward function
6. **Run training** β€” Even on small model, get reward curves
7. **Three.js visualizer** β€” For demo only, not in training loop
8. **Before/after demo** β€” Show base model vs trained model outputs
9. **Polish presentation narrative**
---
## Narrative for Judges
**The story arc:**
1. "LLMs are great at text but terrible at spatial reasoning"
2. "Origami is the perfect testbed β€” it's sequential, physical, and verifiable"
3. "NeurIPS 2025 showed even GPT-5 fails at origami benchmarks, but nobody built a TRAINING environment"
4. "We built OrigamiRL β€” the first multi-turn RL environment for origami instruction generation"
5. "Our rewards come from math theorems, not vibes β€” Kawasaki's theorem is our unit test"
6. "Watch the model go from generating paper-tearing nonsense to valid fold sequences"
7. "This generalizes to any domain where LLMs need to output structured physical instructions"