Spaces:
Running
Running
Commit Β·
2936d2e
1
Parent(s): fc71686
docs/handoff
Browse files- docs/optigami_handoff.md +767 -0
docs/optigami_handoff.md
ADDED
|
@@ -0,0 +1,767 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OrigamiRL β OpenEnv Hackathon Handoff Document
|
| 2 |
+
|
| 3 |
+
## TL;DR
|
| 4 |
+
|
| 5 |
+
Build the **first multi-turn RL environment where an LLM learns to generate origami folding instructions**, verified by a computational origami simulator. Target the OpenEnv Hackathon (March 7-8, 2026, SF β $100K+ in prizes). Use OpenEnv spec + Unsloth GRPO for training. Dense verifiable rewards from origami geometry theorems (Kawasaki, Maekawa). No learned reward model needed.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Hackathon Context
|
| 10 |
+
|
| 11 |
+
- **Event:** OpenEnv Hackathon SF, hosted by Cerebral Valley + Shack15 + Meta/PyTorch
|
| 12 |
+
- **Date:** March 7-8, 2026 (happening NOW)
|
| 13 |
+
- **Prize:** $100K+ cash
|
| 14 |
+
- **Teams:** Up to 4 people
|
| 15 |
+
- **Format:** Build RL environments, post-train a base model
|
| 16 |
+
|
| 17 |
+
### Judging Criteria
|
| 18 |
+
|
| 19 |
+
| Category | Weight | What Matters |
|
| 20 |
+
|----------|--------|-------------|
|
| 21 |
+
| Environment Innovation | 40% | Novel, creative, challenging. Does it meaningfully test agent behavior? |
|
| 22 |
+
| Storytelling | 30% | Clear problem explanation, engaging demo, easy to follow |
|
| 23 |
+
| Training Script Showing Improvement | 20% | Observable reward curves, before/after behavior |
|
| 24 |
+
| Reward and Training Pipeline Setup | 10% | Coherent reward logic, meaningful improvement in inference |
|
| 25 |
+
|
| 26 |
+
### Key Sponsors to Impress
|
| 27 |
+
|
| 28 |
+
- **Meta/PyTorch** β OpenEnv creators, want environments using their spec
|
| 29 |
+
- **Unsloth AI** β GRPO training infra, ART (Agent Reinforcement Trainer). USE THEIR TOOLS.
|
| 30 |
+
- **OpenPipe** β ART trainer (frontend/backend split for GRPO). Also use.
|
| 31 |
+
- **Patronus AI** β Building "generative simulators" (auto-scaling RL environments). They care about curriculum difficulty scaling and verifiable rewards.
|
| 32 |
+
- **Snorkel AI** β "2026 is the year of environments." They care about data quality and environment diversity.
|
| 33 |
+
- **Hugging Face** β OpenEnv Hub, want environments deployed there
|
| 34 |
+
- **Scale AI / Mercor** β Agent evaluation, structured task environments
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
## The Pitch (for judges)
|
| 39 |
+
|
| 40 |
+
> "Spatial reasoning is the next frontier for LLM training β NeurIPS 2025 papers like OrigamiSpace showed that even GPT-5 fails at multi-step origami reasoning. But those are benchmarks, not training environments. We built OrigamiRL: the first multi-turn RL environment where an LLM agent learns to fold paper by outputting instructions, receiving geometric feedback, and improving through GRPO. Our reward function is fully verifiable β fold validity is checked against computational origami axioms, not an LLM judge. We built it on OpenEnv + Unsloth with a natural curriculum from single folds to full cranes."
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## Prior Work (What Exists, Where the Gaps Are)
|
| 45 |
+
|
| 46 |
+
### 1. OrigamiSpace (NeurIPS 2025 Spotlight)
|
| 47 |
+
|
| 48 |
+
- **Paper:** https://arxiv.org/abs/2511.18450
|
| 49 |
+
- **What it is:** Benchmark with 350 origami data instances (CP diagrams, folding processes, folded shapes). 4 evaluation tasks: Pattern Prediction, Multi-step Spatial Reasoning, Spatial Relationship Prediction, End-to-End CP Code Generation.
|
| 50 |
+
- **Their compiler:** Outputs detailed flattened diagrams with crease locations and stacking relationships, supports interactive simulation with MLLMs, provides comprehensive error feedback. Checks: syntax validity, geometric foldability, no self-intersections, Kawasaki's theorem, Maekawa's theorem.
|
| 51 |
+
- **Their reward metrics for code gen:** Hausdorff distance (shape similarity), dihedral angle distribution, bounding box aspect ratios, constraint satisfaction.
|
| 52 |
+
- **Difficulty levels:** Easy (3-9 steps), Medium (10-19 steps), Hard (20-30 steps)
|
| 53 |
+
- **Gap:** Single-turn only (LLM generates complete CP code in one shot). They mention RL exploration but it's not the focus. No multi-turn sequential folding.
|
| 54 |
+
|
| 55 |
+
### 2. GamiBench (Dec 2025)
|
| 56 |
+
|
| 57 |
+
- **Paper:** https://arxiv.org/abs/2512.22207
|
| 58 |
+
- **What it is:** 186 regular + 186 impossible 2D crease patterns with 3D folded shapes from 6 viewpoints. 3 VQA tasks.
|
| 59 |
+
- **Gap:** Evaluation-only, no training. Tests single-step spatial understanding.
|
| 60 |
+
|
| 61 |
+
### 3. SpatialThinker (NeurIPS 2025)
|
| 62 |
+
|
| 63 |
+
- **Paper:** https://arxiv.org/abs/2511.07403
|
| 64 |
+
- **What it is:** 3D-aware MLLM trained with RL using dense spatial rewards. Constructs scene graphs. Multi-objective reward with lexicographic gating.
|
| 65 |
+
- **Key architecture to steal:** Dense reward design with lexicographic ordering β format β count β accuracy β spatial. Nearly doubled RL training gains vs sparse rewards. Only needed 7K training samples with GRPO.
|
| 66 |
+
- **Gap:** Static scene understanding (objects on a table), not sequential physical transformations.
|
| 67 |
+
|
| 68 |
+
### 4. rigid-origami Gym (IJCAI 2023)
|
| 69 |
+
|
| 70 |
+
- **Repo:** https://github.com/belalugaX/rigid-origami
|
| 71 |
+
- **Paper:** "Automating Rigid Origami Design" (https://arxiv.org/abs/2211.13219)
|
| 72 |
+
- **What it is:** Gym environment where agent constructs crease pattern graphs on a board. Sparse rewards. Foldability validated by triangle intersection tests + kinematic rigidity model. Game terminates on non-foldable states.
|
| 73 |
+
- **Gap:** Classical RL agents (discrete grid actions), NOT LLMs generating text. Rigid-origami tessellations only, not traditional origami. No natural language.
|
| 74 |
+
|
| 75 |
+
### 5. The Unique Gap We Fill
|
| 76 |
+
|
| 77 |
+
Nobody has built a model that reasons about **sequential 2D-to-3D geometric transformations with physical constraints** through **natural language instructions** in a **multi-turn RL training loop**. Origami is uniquely hard because it requires tracking how a flat sheet's topology changes through a sequence of folds β mental rotation, spatial visualization, and perspective-taking all at once.
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
## Environment Design
|
| 82 |
+
|
| 83 |
+
### Architecture Overview
|
| 84 |
+
|
| 85 |
+
```
|
| 86 |
+
+---------------------------------------------------+
|
| 87 |
+
| OpenEnv Server |
|
| 88 |
+
| +-----------+ +----------+ +--------------+ |
|
| 89 |
+
| | State | | Action | | Reward | |
|
| 90 |
+
| | (FOLD JSON| | (LLM | | (Dense, | |
|
| 91 |
+
| | + target)| | output) | | verifiable) | |
|
| 92 |
+
| +-----------+ +----------+ +--------------+ |
|
| 93 |
+
| | | | |
|
| 94 |
+
| v v v |
|
| 95 |
+
| +-----------------------------------------------+|
|
| 96 |
+
| | Paper Geometry Engine (Python) ||
|
| 97 |
+
| | - Polygon state (Shapely) ||
|
| 98 |
+
| | - Fold operations (reflection across line) ||
|
| 99 |
+
| | - Kawasaki/Maekawa constraint checks ||
|
| 100 |
+
| | - Layer tracking ||
|
| 101 |
+
| | - FOLD format import/export ||
|
| 102 |
+
| +-----------------------------------------------+|
|
| 103 |
+
| | |
|
| 104 |
+
| v |
|
| 105 |
+
| +-----------------------------------------------+|
|
| 106 |
+
| | Three.js Visualizer (Demo only) ||
|
| 107 |
+
| | - 3D fold animation ||
|
| 108 |
+
| | - Strain heatmap ||
|
| 109 |
+
| | - Instruction stream ||
|
| 110 |
+
| +-----------------------------------------------+|
|
| 111 |
+
+---------------------------------------------------+
|
| 112 |
+
| ^
|
| 113 |
+
v |
|
| 114 |
+
+---------------------------------------------------+
|
| 115 |
+
| Unsloth ART / GRPO Trainer |
|
| 116 |
+
| - Qwen2.5-VL-7B or Qwen3-4B base model |
|
| 117 |
+
| - LoRA/QLoRA for efficient training |
|
| 118 |
+
| - Multi-turn rollouts |
|
| 119 |
+
+---------------------------------------------------+
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
### OpenEnv Spec Compliance
|
| 123 |
+
|
| 124 |
+
Must implement these APIs:
|
| 125 |
+
|
| 126 |
+
```python
|
| 127 |
+
class OrigamiEnv:
|
| 128 |
+
async def reset() -> Observation # New episode: flat paper + target
|
| 129 |
+
async def step(action) -> (Observation, reward, done, info)
|
| 130 |
+
async def state() -> State # Current paper geometry
|
| 131 |
+
async def close() # Cleanup
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
|
| 135 |
+
Install: `pip install -e .` then `openenv init origami_env`
|
| 136 |
+
|
| 137 |
+
### State Space
|
| 138 |
+
|
| 139 |
+
```python
|
| 140 |
+
@dataclass
|
| 141 |
+
class OrigamiState:
|
| 142 |
+
# Current paper geometry
|
| 143 |
+
vertices: List[Tuple[float, float]] # 2D vertex positions
|
| 144 |
+
edges: List[Tuple[int, int]] # Edge connectivity
|
| 145 |
+
edges_assignment: List[str] # 'M', 'V', 'B', 'F' (mountain/valley/boundary/flat)
|
| 146 |
+
edges_foldAngle: List[float] # -180 to 180 degrees
|
| 147 |
+
faces: List[List[int]] # Face vertex indices
|
| 148 |
+
layer_order: List[List[int]] # Face stacking order
|
| 149 |
+
|
| 150 |
+
# Episode context
|
| 151 |
+
target_crease_pattern: dict # Target FOLD JSON
|
| 152 |
+
target_shape_image: Optional[np.ndarray] # Target folded shape (for multimodal)
|
| 153 |
+
instruction_history: List[str] # Previous instructions
|
| 154 |
+
step_count: int
|
| 155 |
+
max_steps: int
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
This maps directly to the **FOLD format** (JSON-based, used by all origami software):
|
| 159 |
+
|
| 160 |
+
```json
|
| 161 |
+
{
|
| 162 |
+
"vertices_coords": [[0,0], [1,0], [1,1], [0,1]],
|
| 163 |
+
"edges_vertices": [[0,1], [1,2], [2,3], [3,0]],
|
| 164 |
+
"edges_assignment": ["B", "B", "B", "B"],
|
| 165 |
+
"edges_foldAngle": [0, 0, 0, 0],
|
| 166 |
+
"faces_vertices": [[0, 1, 2, 3]]
|
| 167 |
+
}
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
FOLD spec: https://github.com/edemaine/fold
|
| 171 |
+
FOLD JS library: https://edemaine.github.io/fold/
|
| 172 |
+
|
| 173 |
+
### Action Space
|
| 174 |
+
|
| 175 |
+
The LLM outputs a JSON action:
|
| 176 |
+
|
| 177 |
+
```json
|
| 178 |
+
{
|
| 179 |
+
"instruction": "Fold the top edge down to meet the bottom edge",
|
| 180 |
+
"fold_line": [[0, 0.5], [1, 0.5]],
|
| 181 |
+
"fold_angle": -180,
|
| 182 |
+
"assignment": "V"
|
| 183 |
+
}
|
| 184 |
+
```
|
| 185 |
+
|
| 186 |
+
The `instruction` field is natural language (what we're training the model to produce well). The geometric fields are the verifiable representation. During training, the model outputs both; for the final demo, the NL instruction is the star.
|
| 187 |
+
|
| 188 |
+
Alternative simpler action (for early iterations):
|
| 189 |
+
|
| 190 |
+
```json
|
| 191 |
+
{
|
| 192 |
+
"instruction": "Valley fold along the horizontal center line",
|
| 193 |
+
"fold_type": "valley",
|
| 194 |
+
"fold_axis": "horizontal",
|
| 195 |
+
"fold_position": 0.5
|
| 196 |
+
}
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
### Reward Function β Dense, Multi-Objective, Lexicographically Gated
|
| 200 |
+
|
| 201 |
+
Inspired by SpatialThinker's design. Rewards are computed in order; later rewards only apply if earlier gates pass.
|
| 202 |
+
|
| 203 |
+
```python
|
| 204 |
+
def compute_reward(state, action, new_state, target) -> dict:
|
| 205 |
+
rewards = {}
|
| 206 |
+
|
| 207 |
+
# LEVEL 1: Format (gate for everything else)
|
| 208 |
+
# Does the output parse into a valid fold operation?
|
| 209 |
+
rewards['format'] = 1.0 if parseable(action) else 0.0
|
| 210 |
+
if rewards['format'] == 0:
|
| 211 |
+
return rewards # Stop here
|
| 212 |
+
|
| 213 |
+
# LEVEL 2: Local Geometric Validity
|
| 214 |
+
# Kawasaki's theorem: sector angles at each interior vertex sum to 2pi
|
| 215 |
+
kawasaki_valid = check_kawasaki(new_state)
|
| 216 |
+
# Maekawa's theorem: |M - V| = 2 at each interior vertex
|
| 217 |
+
maekawa_valid = check_maekawa(new_state)
|
| 218 |
+
# No self-intersection
|
| 219 |
+
no_intersection = check_no_self_intersection(new_state)
|
| 220 |
+
rewards['validity'] = (kawasaki_valid + maekawa_valid + no_intersection) / 3.0
|
| 221 |
+
if rewards['validity'] < 0.5:
|
| 222 |
+
return rewards # Stop here
|
| 223 |
+
|
| 224 |
+
# LEVEL 3: Physical Feasibility
|
| 225 |
+
# Can this fold actually be performed given layer stack?
|
| 226 |
+
layer_consistent = check_layer_ordering(new_state)
|
| 227 |
+
fold_achievable = check_fold_angle_feasible(new_state)
|
| 228 |
+
rewards['feasibility'] = (layer_consistent + fold_achievable) / 2.0
|
| 229 |
+
|
| 230 |
+
# LEVEL 4: Progress Toward Target (Dense)
|
| 231 |
+
# Crease pattern graph similarity
|
| 232 |
+
cp_similarity = crease_pattern_similarity(new_state, target)
|
| 233 |
+
# Fold angle distribution match
|
| 234 |
+
angle_similarity = fold_angle_distribution_match(new_state, target)
|
| 235 |
+
# Bounding box aspect ratio match
|
| 236 |
+
bbox_similarity = bounding_box_similarity(new_state, target)
|
| 237 |
+
rewards['progress'] = 0.4 * cp_similarity + 0.4 * angle_similarity + 0.2 * bbox_similarity
|
| 238 |
+
|
| 239 |
+
# LEVEL 5: Completion Bonus
|
| 240 |
+
if shape_matches_target(new_state, target, tolerance=0.05):
|
| 241 |
+
rewards['completion'] = 10.0
|
| 242 |
+
|
| 243 |
+
# LEVEL 6: Efficiency
|
| 244 |
+
rewards['efficiency'] = -0.01 # Small step penalty to encourage fewer folds
|
| 245 |
+
|
| 246 |
+
# Total
|
| 247 |
+
rewards['total'] = (
|
| 248 |
+
0.1 * rewards['format'] +
|
| 249 |
+
0.2 * rewards['validity'] +
|
| 250 |
+
0.1 * rewards['feasibility'] +
|
| 251 |
+
0.5 * rewards['progress'] +
|
| 252 |
+
rewards.get('completion', 0) +
|
| 253 |
+
rewards['efficiency']
|
| 254 |
+
)
|
| 255 |
+
return rewards
|
| 256 |
+
```
|
| 257 |
+
|
| 258 |
+
### Key Origami Theorems for Verification
|
| 259 |
+
|
| 260 |
+
These are the verifiable constraints β the "unit tests" of origami:
|
| 261 |
+
|
| 262 |
+
1. **Kawasaki's Theorem:** At any interior vertex of a flat-foldable crease pattern, the alternating sum of sector angles equals zero (equivalently, they sum to 2pi on each side). NECESSARY condition for flat-foldability.
|
| 263 |
+
|
| 264 |
+
2. **Maekawa's Theorem:** At any interior vertex, the number of mountain folds minus valley folds equals +/-2. |M - V| = 2.
|
| 265 |
+
|
| 266 |
+
3. **No self-intersection:** Faces cannot penetrate each other during folding.
|
| 267 |
+
|
| 268 |
+
4. **Euler's formula for planar graphs:** V - E + F = 2 (sanity check on graph structure).
|
| 269 |
+
|
| 270 |
+
5. **Huzita-Hatori axioms:** The 7 axioms defining all possible single-fold operations (point-to-point, point-to-line, line-to-line, etc.). These define the VALID action space.
|
| 271 |
+
|
| 272 |
+
### Curriculum Design
|
| 273 |
+
|
| 274 |
+
| Level | Folds | Examples | Complexity |
|
| 275 |
+
|-------|-------|----------|-----------|
|
| 276 |
+
| 1 | 1 | Valley fold in half, mountain fold corner | Single fold validity |
|
| 277 |
+
| 2 | 2-3 | Paper airplane nose, triangle fold | Sequential dependency |
|
| 278 |
+
| 3 | 4-6 | Simple boat, fortune teller | Multi-step with symmetry |
|
| 279 |
+
| 4 | 7-12 | Paper airplane (full), jumping frog | Longer horizon planning |
|
| 280 |
+
| 5 | 13-20 | Crane, lily | Complex spatial tracking |
|
| 281 |
+
|
| 282 |
+
For the hackathon, focus on Levels 1-3. Even showing reward improvement on Level 1-2 is a strong result.
|
| 283 |
+
|
| 284 |
+
---
|
| 285 |
+
|
| 286 |
+
## Core Implementation: Python Geometry Engine
|
| 287 |
+
|
| 288 |
+
This is the MOST IMPORTANT piece. Pure Python, no JS dependencies.
|
| 289 |
+
|
| 290 |
+
```python
|
| 291 |
+
import numpy as np
|
| 292 |
+
from shapely.geometry import Polygon, LineString, MultiPolygon
|
| 293 |
+
from shapely.ops import split
|
| 294 |
+
from typing import List, Tuple, Dict
|
| 295 |
+
import json
|
| 296 |
+
|
| 297 |
+
class PaperState:
|
| 298 |
+
"""Represents the current state of the origami paper."""
|
| 299 |
+
|
| 300 |
+
def __init__(self, size: float = 1.0):
|
| 301 |
+
# Start with a unit square
|
| 302 |
+
self.regions = [Polygon([(0,0), (size,0), (size,size), (0,size)])]
|
| 303 |
+
self.fold_history = []
|
| 304 |
+
self.crease_lines = []
|
| 305 |
+
self.crease_assignments = [] # 'M' or 'V'
|
| 306 |
+
self.crease_angles = []
|
| 307 |
+
self.layer_order = [0] # Stack order of regions
|
| 308 |
+
|
| 309 |
+
def apply_fold(self, fold_line: LineString, angle: float, assignment: str) -> dict:
|
| 310 |
+
"""
|
| 311 |
+
Apply a fold operation. Returns dict with validity info.
|
| 312 |
+
fold_line: Shapely LineString defining the fold axis
|
| 313 |
+
angle: fold angle in degrees (-180 to 180)
|
| 314 |
+
assignment: 'M' (mountain) or 'V' (valley)
|
| 315 |
+
"""
|
| 316 |
+
result = {'valid': True, 'errors': []}
|
| 317 |
+
|
| 318 |
+
# 1. Split regions by fold line
|
| 319 |
+
new_regions = []
|
| 320 |
+
for region in self.regions:
|
| 321 |
+
if fold_line.intersects(region):
|
| 322 |
+
parts = split(region, fold_line)
|
| 323 |
+
new_regions.extend(parts.geoms)
|
| 324 |
+
else:
|
| 325 |
+
new_regions.append(region)
|
| 326 |
+
|
| 327 |
+
# 2. Determine which side folds (based on assignment)
|
| 328 |
+
folding_side = []
|
| 329 |
+
staying_side = []
|
| 330 |
+
for region in new_regions:
|
| 331 |
+
centroid = region.centroid
|
| 332 |
+
side = self._point_side(centroid, fold_line)
|
| 333 |
+
if side > 0:
|
| 334 |
+
folding_side.append(region)
|
| 335 |
+
else:
|
| 336 |
+
staying_side.append(region)
|
| 337 |
+
|
| 338 |
+
# 3. Reflect folding regions across fold line
|
| 339 |
+
reflected = [self._reflect_polygon(r, fold_line) for r in folding_side]
|
| 340 |
+
|
| 341 |
+
# 4. Update state
|
| 342 |
+
self.regions = staying_side + reflected
|
| 343 |
+
self.crease_lines.append(fold_line)
|
| 344 |
+
self.crease_assignments.append(assignment)
|
| 345 |
+
self.crease_angles.append(angle)
|
| 346 |
+
self.fold_history.append({
|
| 347 |
+
'line': list(fold_line.coords),
|
| 348 |
+
'angle': angle,
|
| 349 |
+
'assignment': assignment
|
| 350 |
+
})
|
| 351 |
+
|
| 352 |
+
# 5. Update layer order
|
| 353 |
+
self._update_layer_order(staying_side, reflected)
|
| 354 |
+
|
| 355 |
+
return result
|
| 356 |
+
|
| 357 |
+
def _reflect_polygon(self, poly: Polygon, line: LineString) -> Polygon:
|
| 358 |
+
"""Reflect a polygon across a line."""
|
| 359 |
+
coords = list(poly.exterior.coords)
|
| 360 |
+
reflected_coords = [self._reflect_point(p, line) for p in coords]
|
| 361 |
+
return Polygon(reflected_coords)
|
| 362 |
+
|
| 363 |
+
def _reflect_point(self, point: tuple, line: LineString) -> tuple:
|
| 364 |
+
"""Reflect a point across a line."""
|
| 365 |
+
p = np.array(point[:2])
|
| 366 |
+
l1 = np.array(line.coords[0])
|
| 367 |
+
l2 = np.array(line.coords[1])
|
| 368 |
+
d = l2 - l1
|
| 369 |
+
d = d / np.linalg.norm(d)
|
| 370 |
+
# Reflection formula: p' = p - 2(p-l1).n * n where n is normal to line
|
| 371 |
+
n = np.array([-d[1], d[0]])
|
| 372 |
+
v = p - l1
|
| 373 |
+
return tuple(p - 2 * np.dot(v, n) * n)
|
| 374 |
+
|
| 375 |
+
def _point_side(self, point, line: LineString) -> float:
|
| 376 |
+
"""Returns positive if point is on left side of line, negative if right."""
|
| 377 |
+
p = np.array([point.x, point.y])
|
| 378 |
+
l1 = np.array(line.coords[0])
|
| 379 |
+
l2 = np.array(line.coords[1])
|
| 380 |
+
return float(np.cross(l2 - l1, p - l1))
|
| 381 |
+
|
| 382 |
+
def _update_layer_order(self, staying, reflected):
|
| 383 |
+
"""Update the layer stacking order after a fold."""
|
| 384 |
+
self.layer_order = list(range(len(staying))) + \
|
| 385 |
+
list(range(len(staying), len(staying) + len(reflected)))
|
| 386 |
+
|
| 387 |
+
def to_fold_json(self) -> dict:
|
| 388 |
+
"""Export current state as FOLD format JSON."""
|
| 389 |
+
vertices = set()
|
| 390 |
+
for line in self.crease_lines:
|
| 391 |
+
for coord in line.coords:
|
| 392 |
+
vertices.add(tuple(round(c, 10) for c in coord))
|
| 393 |
+
# Add boundary vertices
|
| 394 |
+
for region in self.regions:
|
| 395 |
+
for coord in region.exterior.coords:
|
| 396 |
+
vertices.add(tuple(round(c, 10) for c in coord[:2]))
|
| 397 |
+
|
| 398 |
+
vertices = sorted(list(vertices))
|
| 399 |
+
vertex_map = {v: i for i, v in enumerate(vertices)}
|
| 400 |
+
|
| 401 |
+
edge_set = set()
|
| 402 |
+
edges_list = []
|
| 403 |
+
assignments_list = []
|
| 404 |
+
angles_list = []
|
| 405 |
+
|
| 406 |
+
# Add crease edges
|
| 407 |
+
for i, line in enumerate(self.crease_lines):
|
| 408 |
+
c = [tuple(round(x, 10) for x in coord) for coord in line.coords]
|
| 409 |
+
edge = tuple(sorted([vertex_map[c[0]], vertex_map[c[1]]]))
|
| 410 |
+
if edge not in edge_set:
|
| 411 |
+
edge_set.add(edge)
|
| 412 |
+
edges_list.append(list(edge))
|
| 413 |
+
assignments_list.append(self.crease_assignments[i])
|
| 414 |
+
angles_list.append(self.crease_angles[i])
|
| 415 |
+
|
| 416 |
+
return {
|
| 417 |
+
'vertices_coords': [list(v) for v in vertices],
|
| 418 |
+
'edges_vertices': edges_list,
|
| 419 |
+
'edges_assignment': assignments_list,
|
| 420 |
+
'edges_foldAngle': angles_list,
|
| 421 |
+
}
|
| 422 |
+
|
| 423 |
+
|
| 424 |
+
class OrigamiVerifier:
|
| 425 |
+
"""Verifiable reward functions based on origami theorems."""
|
| 426 |
+
|
| 427 |
+
@staticmethod
|
| 428 |
+
def check_kawasaki(state: PaperState) -> bool:
|
| 429 |
+
"""Kawasaki's theorem: alternating sum of angles at each interior vertex = 0."""
|
| 430 |
+
fold_json = state.to_fold_json()
|
| 431 |
+
vertices = fold_json['vertices_coords']
|
| 432 |
+
edges = fold_json['edges_vertices']
|
| 433 |
+
|
| 434 |
+
for v_idx in range(len(vertices)):
|
| 435 |
+
v = vertices[v_idx]
|
| 436 |
+
incident_edges = [e for e in edges if v_idx in e]
|
| 437 |
+
if len(incident_edges) < 4:
|
| 438 |
+
continue # Need degree-4+ for Kawasaki
|
| 439 |
+
|
| 440 |
+
# Calculate sector angles
|
| 441 |
+
angles = []
|
| 442 |
+
for e in incident_edges:
|
| 443 |
+
other = e[1] if e[0] == v_idx else e[0]
|
| 444 |
+
other_v = vertices[other]
|
| 445 |
+
angle = np.arctan2(other_v[1] - v[1], other_v[0] - v[0])
|
| 446 |
+
angles.append(angle)
|
| 447 |
+
|
| 448 |
+
angles.sort()
|
| 449 |
+
sector_angles = []
|
| 450 |
+
for i in range(len(angles) - 1):
|
| 451 |
+
sector_angles.append(angles[i+1] - angles[i])
|
| 452 |
+
sector_angles.append(2*np.pi - (angles[-1] - angles[0]))
|
| 453 |
+
|
| 454 |
+
# Kawasaki: alternating sum should be ~0
|
| 455 |
+
if len(sector_angles) >= 4:
|
| 456 |
+
alt_sum = sum(sector_angles[::2]) - sum(sector_angles[1::2])
|
| 457 |
+
if abs(alt_sum) > 0.01:
|
| 458 |
+
return False
|
| 459 |
+
return True
|
| 460 |
+
|
| 461 |
+
@staticmethod
|
| 462 |
+
def check_maekawa(state: PaperState) -> bool:
|
| 463 |
+
"""Maekawa's theorem: |M - V| = 2 at each interior vertex."""
|
| 464 |
+
fold_json = state.to_fold_json()
|
| 465 |
+
vertices = fold_json['vertices_coords']
|
| 466 |
+
edges = fold_json['edges_vertices']
|
| 467 |
+
assignments = fold_json['edges_assignment']
|
| 468 |
+
|
| 469 |
+
for v_idx in range(len(vertices)):
|
| 470 |
+
incident = [(i, e) for i, e in enumerate(edges) if v_idx in e]
|
| 471 |
+
m_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'M')
|
| 472 |
+
v_count = sum(1 for i, _ in incident if i < len(assignments) and assignments[i] == 'V')
|
| 473 |
+
|
| 474 |
+
if m_count + v_count >= 4: # Interior vertex with folds
|
| 475 |
+
if abs(m_count - v_count) != 2:
|
| 476 |
+
return False
|
| 477 |
+
return True
|
| 478 |
+
|
| 479 |
+
@staticmethod
|
| 480 |
+
def crease_pattern_similarity(state: PaperState, target_fold_json: dict) -> float:
|
| 481 |
+
"""Compare current crease pattern to target. Returns 0-1 similarity."""
|
| 482 |
+
current = state.to_fold_json()
|
| 483 |
+
|
| 484 |
+
n_current = len(current.get('edges_vertices', []))
|
| 485 |
+
n_target = len(target_fold_json.get('edges_vertices', []))
|
| 486 |
+
|
| 487 |
+
if n_target == 0:
|
| 488 |
+
return 1.0 if n_current == 0 else 0.0
|
| 489 |
+
|
| 490 |
+
edge_count_sim = 1.0 - abs(n_current - n_target) / max(n_target, 1)
|
| 491 |
+
edge_count_sim = max(0, edge_count_sim)
|
| 492 |
+
|
| 493 |
+
current_assignments = current.get('edges_assignment', [])
|
| 494 |
+
target_assignments = target_fold_json.get('edges_assignment', [])
|
| 495 |
+
|
| 496 |
+
c_m = current_assignments.count('M')
|
| 497 |
+
c_v = current_assignments.count('V')
|
| 498 |
+
t_m = target_assignments.count('M')
|
| 499 |
+
t_v = target_assignments.count('V')
|
| 500 |
+
|
| 501 |
+
total = max(t_m + t_v, 1)
|
| 502 |
+
assign_sim = 1.0 - (abs(c_m - t_m) + abs(c_v - t_v)) / (2 * total)
|
| 503 |
+
assign_sim = max(0, assign_sim)
|
| 504 |
+
|
| 505 |
+
return 0.5 * edge_count_sim + 0.5 * assign_sim
|
| 506 |
+
```
|
| 507 |
+
|
| 508 |
+
---
|
| 509 |
+
|
| 510 |
+
## OpenEnv Environment Wrapper
|
| 511 |
+
|
| 512 |
+
```python
|
| 513 |
+
# origami_env/server.py
|
| 514 |
+
from openenv.core import Environment
|
| 515 |
+
from paper_engine import PaperState, OrigamiVerifier
|
| 516 |
+
from shapely.geometry import LineString
|
| 517 |
+
import json
|
| 518 |
+
|
| 519 |
+
class OrigamiEnvironment(Environment):
|
| 520 |
+
|
| 521 |
+
def __init__(self, targets_dir="targets/", max_steps=20):
|
| 522 |
+
self.targets_dir = targets_dir
|
| 523 |
+
self.max_steps = max_steps
|
| 524 |
+
self.paper = None
|
| 525 |
+
self.target = None
|
| 526 |
+
self.step_count = 0
|
| 527 |
+
|
| 528 |
+
async def reset(self, target_id=None):
|
| 529 |
+
self.paper = PaperState(size=1.0)
|
| 530 |
+
self.target = self._load_target(target_id)
|
| 531 |
+
self.step_count = 0
|
| 532 |
+
return self._get_observation()
|
| 533 |
+
|
| 534 |
+
async def step(self, action):
|
| 535 |
+
self.step_count += 1
|
| 536 |
+
|
| 537 |
+
# Parse action
|
| 538 |
+
try:
|
| 539 |
+
fold_line = LineString(action['fold_line'])
|
| 540 |
+
angle = action['fold_angle']
|
| 541 |
+
assignment = action['assignment']
|
| 542 |
+
except (KeyError, Exception):
|
| 543 |
+
reward = {'format': 0, 'total': -0.1}
|
| 544 |
+
return self._get_observation(), reward, False, {'error': 'parse_failed'}
|
| 545 |
+
|
| 546 |
+
# Apply fold
|
| 547 |
+
result = self.paper.apply_fold(fold_line, angle, assignment)
|
| 548 |
+
|
| 549 |
+
# Compute rewards
|
| 550 |
+
reward = self._compute_reward(result)
|
| 551 |
+
|
| 552 |
+
# Check termination
|
| 553 |
+
done = (
|
| 554 |
+
self.step_count >= self.max_steps or
|
| 555 |
+
reward.get('completion', 0) > 0
|
| 556 |
+
)
|
| 557 |
+
|
| 558 |
+
return self._get_observation(), reward, done, {}
|
| 559 |
+
|
| 560 |
+
async def state(self):
|
| 561 |
+
return {
|
| 562 |
+
'paper': self.paper.to_fold_json(),
|
| 563 |
+
'target': self.target,
|
| 564 |
+
'step': self.step_count,
|
| 565 |
+
'fold_history': self.paper.fold_history
|
| 566 |
+
}
|
| 567 |
+
|
| 568 |
+
def _compute_reward(self, fold_result):
|
| 569 |
+
rewards = {}
|
| 570 |
+
rewards['format'] = 1.0
|
| 571 |
+
|
| 572 |
+
kawasaki = OrigamiVerifier.check_kawasaki(self.paper)
|
| 573 |
+
maekawa = OrigamiVerifier.check_maekawa(self.paper)
|
| 574 |
+
rewards['validity'] = (float(kawasaki) + float(maekawa)) / 2.0
|
| 575 |
+
|
| 576 |
+
rewards['progress'] = OrigamiVerifier.crease_pattern_similarity(
|
| 577 |
+
self.paper, self.target
|
| 578 |
+
)
|
| 579 |
+
|
| 580 |
+
if rewards['progress'] > 0.95:
|
| 581 |
+
rewards['completion'] = 10.0
|
| 582 |
+
|
| 583 |
+
rewards['efficiency'] = -0.01
|
| 584 |
+
|
| 585 |
+
rewards['total'] = (
|
| 586 |
+
0.1 * rewards['format'] +
|
| 587 |
+
0.2 * rewards['validity'] +
|
| 588 |
+
0.6 * rewards['progress'] +
|
| 589 |
+
rewards.get('completion', 0) +
|
| 590 |
+
rewards['efficiency']
|
| 591 |
+
)
|
| 592 |
+
return rewards
|
| 593 |
+
|
| 594 |
+
def _get_observation(self):
|
| 595 |
+
return {
|
| 596 |
+
'paper_state': self.paper.to_fold_json(),
|
| 597 |
+
'target': self.target,
|
| 598 |
+
'step': self.step_count,
|
| 599 |
+
'instruction_history': [str(f['line']) for f in self.paper.fold_history]
|
| 600 |
+
}
|
| 601 |
+
|
| 602 |
+
def _load_target(self, target_id):
|
| 603 |
+
if target_id:
|
| 604 |
+
with open(f"{self.targets_dir}/{target_id}.fold") as f:
|
| 605 |
+
return json.load(f)
|
| 606 |
+
# Default: simple valley fold in half
|
| 607 |
+
return {
|
| 608 |
+
'vertices_coords': [[0,0], [1,0], [1,1], [0,1], [0,0.5], [1,0.5]],
|
| 609 |
+
'edges_vertices': [[0,1], [1,2], [2,3], [3,0], [4,5]],
|
| 610 |
+
'edges_assignment': ['B', 'B', 'B', 'B', 'V'],
|
| 611 |
+
'edges_foldAngle': [0, 0, 0, 0, -180],
|
| 612 |
+
}
|
| 613 |
+
```
|
| 614 |
+
|
| 615 |
+
---
|
| 616 |
+
|
| 617 |
+
## Training Script (Unsloth GRPO)
|
| 618 |
+
|
| 619 |
+
```python
|
| 620 |
+
# train.py
|
| 621 |
+
from unsloth import FastLanguageModel
|
| 622 |
+
from trl import GRPOConfig, GRPOTrainer
|
| 623 |
+
import torch
|
| 624 |
+
|
| 625 |
+
# Load model
|
| 626 |
+
model, tokenizer = FastLanguageModel.from_pretrained(
|
| 627 |
+
model_name="unsloth/Qwen2.5-7B-Instruct",
|
| 628 |
+
max_seq_length=4096,
|
| 629 |
+
load_in_4bit=True,
|
| 630 |
+
)
|
| 631 |
+
|
| 632 |
+
# Add LoRA
|
| 633 |
+
model = FastLanguageModel.get_peft_model(
|
| 634 |
+
model,
|
| 635 |
+
r=32,
|
| 636 |
+
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
|
| 637 |
+
"gate_proj", "up_proj", "down_proj"],
|
| 638 |
+
lora_alpha=32,
|
| 639 |
+
lora_dropout=0,
|
| 640 |
+
use_gradient_checkpointing="unsloth",
|
| 641 |
+
)
|
| 642 |
+
|
| 643 |
+
# Reward function
|
| 644 |
+
def origami_reward(completions, prompts):
|
| 645 |
+
"""Compute rewards for a batch of completions."""
|
| 646 |
+
rewards = []
|
| 647 |
+
for completion in completions:
|
| 648 |
+
try:
|
| 649 |
+
action = parse_fold_action(completion)
|
| 650 |
+
paper = PaperState()
|
| 651 |
+
result = paper.apply_fold(action['fold_line'], action['angle'], action['assignment'])
|
| 652 |
+
r = compute_reward(paper, target)
|
| 653 |
+
rewards.append(r['total'])
|
| 654 |
+
except Exception:
|
| 655 |
+
rewards.append(-0.1)
|
| 656 |
+
return rewards
|
| 657 |
+
|
| 658 |
+
# GRPO Config
|
| 659 |
+
config = GRPOConfig(
|
| 660 |
+
output_dir="origami-grpo",
|
| 661 |
+
num_train_epochs=3,
|
| 662 |
+
per_device_train_batch_size=4,
|
| 663 |
+
gradient_accumulation_steps=4,
|
| 664 |
+
learning_rate=5e-6,
|
| 665 |
+
max_completion_length=512,
|
| 666 |
+
num_generations=8,
|
| 667 |
+
temperature=1.0,
|
| 668 |
+
logging_steps=1,
|
| 669 |
+
)
|
| 670 |
+
|
| 671 |
+
dataset = load_origami_prompts()
|
| 672 |
+
|
| 673 |
+
trainer = GRPOTrainer(
|
| 674 |
+
model=model,
|
| 675 |
+
config=config,
|
| 676 |
+
train_dataset=dataset,
|
| 677 |
+
reward_funcs=[origami_reward],
|
| 678 |
+
tokenizer=tokenizer,
|
| 679 |
+
)
|
| 680 |
+
|
| 681 |
+
trainer.train()
|
| 682 |
+
```
|
| 683 |
+
|
| 684 |
+
---
|
| 685 |
+
|
| 686 |
+
## Visualization (Demo Only β Not in Training Loop)
|
| 687 |
+
|
| 688 |
+
### Options
|
| 689 |
+
|
| 690 |
+
1. **Origami Simulator** β https://github.com/amandaghassaei/OrigamiSimulator β Three.js, accepts FOLD files, shows folding animation with strain visualization
|
| 691 |
+
2. **PackCAD** β https://packcad.com/ β Web-based, SVG crease patterns, rigid folding simulation
|
| 692 |
+
3. **Custom Three.js** β Simpler but more control
|
| 693 |
+
|
| 694 |
+
### Demo UI Layout
|
| 695 |
+
|
| 696 |
+
```
|
| 697 |
+
+----------------------+----------------------+
|
| 698 |
+
| Instruction Stream | 3D Fold Viewer |
|
| 699 |
+
| | |
|
| 700 |
+
| Step 1: Valley fold | [Three.js canvas] |
|
| 701 |
+
| along center [OK] | |
|
| 702 |
+
| | Paper animating |
|
| 703 |
+
| Step 2: Fold top | fold by fold |
|
| 704 |
+
| corners to center | |
|
| 705 |
+
| | |
|
| 706 |
+
+----------------------+----------------------+
|
| 707 |
+
| Reward Dashboard |
|
| 708 |
+
| Format: ========== 1.0 |
|
| 709 |
+
| Validity: ========.. 0.8 |
|
| 710 |
+
| Progress: ======.... 0.6 |
|
| 711 |
+
| Total: =======... 0.72 |
|
| 712 |
+
| |
|
| 713 |
+
| [Reward curve over training steps] |
|
| 714 |
+
+----------------------------------------------+
|
| 715 |
+
```
|
| 716 |
+
|
| 717 |
+
---
|
| 718 |
+
|
| 719 |
+
## Key Libraries and Resources
|
| 720 |
+
|
| 721 |
+
| Tool | Purpose | Link |
|
| 722 |
+
|------|---------|------|
|
| 723 |
+
| OpenEnv | Environment framework | https://github.com/meta-pytorch/OpenEnv |
|
| 724 |
+
| Unsloth | GRPO training | https://github.com/unslothai/unsloth |
|
| 725 |
+
| OpenPipe ART | Multi-turn RL trainer | https://github.com/OpenPipe/ART |
|
| 726 |
+
| FOLD format | Origami data structure | https://github.com/edemaine/fold |
|
| 727 |
+
| Rabbit Ear | JS origami library | https://github.com/rabbit-ear/rabbit-ear |
|
| 728 |
+
| Origami Simulator | 3D visualization | https://github.com/amandaghassaei/OrigamiSimulator |
|
| 729 |
+
| PackCAD | Folding simulation | https://packcad.com/ |
|
| 730 |
+
| Shapely | Python geometry | pip install shapely |
|
| 731 |
+
| rigid-origami gym | Reference gym env | https://github.com/belalugaX/rigid-origami |
|
| 732 |
+
|
| 733 |
+
### Papers to Cite
|
| 734 |
+
|
| 735 |
+
- OrigamiSpace: https://arxiv.org/abs/2511.18450
|
| 736 |
+
- GamiBench: https://arxiv.org/abs/2512.22207
|
| 737 |
+
- SpatialThinker: https://arxiv.org/abs/2511.07403
|
| 738 |
+
- Automating Rigid Origami Design: https://arxiv.org/abs/2211.13219
|
| 739 |
+
- FOLD format spec: https://github.com/edemaine/fold/blob/main/doc/spec.md
|
| 740 |
+
|
| 741 |
+
---
|
| 742 |
+
|
| 743 |
+
## Priority Build Order
|
| 744 |
+
|
| 745 |
+
1. **Python geometry engine** β PaperState class with fold operations and FOLD export
|
| 746 |
+
2. **Verifier functions** β Kawasaki, Maekawa, similarity metrics
|
| 747 |
+
3. **OpenEnv wrapper** β step/reset/state API
|
| 748 |
+
4. **Simple targets** β Hand-create 5-10 Level 1-2 targets as .fold files
|
| 749 |
+
5. **Training script** β Wire up Unsloth GRPO with reward function
|
| 750 |
+
6. **Run training** β Even on small model, get reward curves
|
| 751 |
+
7. **Three.js visualizer** β For demo only, not in training loop
|
| 752 |
+
8. **Before/after demo** β Show base model vs trained model outputs
|
| 753 |
+
9. **Polish presentation narrative**
|
| 754 |
+
|
| 755 |
+
---
|
| 756 |
+
|
| 757 |
+
## Narrative for Judges
|
| 758 |
+
|
| 759 |
+
**The story arc:**
|
| 760 |
+
|
| 761 |
+
1. "LLMs are great at text but terrible at spatial reasoning"
|
| 762 |
+
2. "Origami is the perfect testbed β it's sequential, physical, and verifiable"
|
| 763 |
+
3. "NeurIPS 2025 showed even GPT-5 fails at origami benchmarks, but nobody built a TRAINING environment"
|
| 764 |
+
4. "We built OrigamiRL β the first multi-turn RL environment for origami instruction generation"
|
| 765 |
+
5. "Our rewards come from math theorems, not vibes β Kawasaki's theorem is our unit test"
|
| 766 |
+
6. "Watch the model go from generating paper-tearing nonsense to valid fold sequences"
|
| 767 |
+
7. "This generalizes to any domain where LLMs need to output structured physical instructions"
|