optigami / research /plan /openenv_arch.md
ianalin123's picture
docs: add research/ planning and architecture docs
0153179
|
raw
history blame
47.4 kB
# OpenEnv Environment Architecture β€” Origami RL

> Updated plan reflecting actual OpenEnv patterns (from 2048 reference),
> proper rendering strategy (Three.js viewer, not matplotlib),
> and clear separation between training vs demo contexts.

---

## 1. Overview

Two distinct contexts use the SAME server code:

CONTEXT 1: RL TRAINING (Colab/GPU machine) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Colab Notebook β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ GRPOTrainer β”‚ β”‚ OpenEnv Server (subprocess)β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ port 9000 β”‚ β”‚ β”‚ β”‚ LLM generates β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ fold_strategy() │───▢│ reset() β†’ step() β†’ state β”‚ β”‚ β”‚ β”‚ │◀───│ returns: paper_state + β”‚ β”‚ β”‚ β”‚ Reward functions β”‚ β”‚ metrics + reward β”‚ β”‚ β”‚ β”‚ score the result β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ NO RENDERING β”‚ β”‚ β”‚ β”‚ Just geometry + numbers β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

CONTEXT 2: DEMO / HACKATHON (HF Space + Browser) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ HF Space (Docker) β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ FastAPI via create_app() β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ /ws β†’ OpenEnv WebSocket protocol β”‚ β”‚ β”‚ β”‚ /reset, /step, /state β†’ OpenEnv HTTP (stateless) β”‚ β”‚ β”‚ β”‚ /health, /schema β†’ OpenEnv metadata β”‚ β”‚ β”‚ β”‚ /web β†’ OpenEnv built-in generic UI β”‚ β”‚ β”‚ β”‚ /viewer β†’ Three.js origami viewer β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ OrigamiEnvironment β”‚ β”‚ β”‚ β”‚ reset() / step() / state β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ Engine β”‚ β”‚ Task System β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ paper β”‚ β”‚ task pool β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ fold β”‚ β”‚ materials β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ physics β”‚ β”‚ curriculum β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ validate β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ metrics β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Three.js Viewer (static HTML/JS, served by same β”‚ β”‚ β”‚ β”‚ FastAPI at /viewer) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Connects to /ws β†’ receives paper_state β”‚ β”‚ β”‚ β”‚ Renders: 2D crease (Canvas) + 3D mesh (WebGL) β”‚ β”‚ β”‚ β”‚ + strain heatmap (vertex colors) + animation β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Browser opens /viewer β†’ sees origami fold live in 3D


### Key Design Decisions

1. **No server-side image rendering.** No matplotlib, no Pillow, no imageio.
   The server returns FOLD JSON data (vertices, edges, faces, strain) in every
   observation. Rendering happens in the browser via Three.js (demo) or not at
   all (training).

2. **`render_urls` removed from Observation.** The old design saved PNGs to disk
   and returned file paths β€” nobody fetched them during training (wasted CPU/disk).
   Instead, `paper_state` IS the render data. The Three.js viewer reads it directly.

3. **Same server code for both contexts.** The only difference is who consumes it:
   - Training: Python reward functions read `paper_state.metrics` β†’ compute reward
   - Demo: Three.js reads `paper_state.vertices_coords` β†’ renders 3D mesh

4. **`openenv push`** deploys to HF Spaces. Sets `ENABLE_WEB_INTERFACE=true`
   which gives us the built-in generic UI at `/web`. Our custom Three.js viewer
   is served as static files at `/viewer`.

---

## 2. Repository Structure

origami_env/ # THE deliverable β€” one package β”‚ β”œβ”€β”€ server/ # Python backend (OpenEnv server) β”‚ β”‚ β”‚ β”œβ”€β”€ engine/ # Origami simulation core β”‚ β”‚ β”œβ”€β”€ init.py β”‚ β”‚ β”œβ”€β”€ paper.py # PaperState dataclass, FOLD I/O, create_flat_sheet() β”‚ β”‚ β”œβ”€β”€ fold.py # apply_fold() β€” quaternion rotation, face splitting β”‚ β”‚ β”œβ”€β”€ physics.py # Bar-and-hinge Verlet solver, strain computation β”‚ β”‚ β”œβ”€β”€ validation.py # Kawasaki, Maekawa, self-intersection detection β”‚ β”‚ β”œβ”€β”€ metrics.py # ALL metrics β€” compactness, strain, shape, deployability β”‚ β”‚ └── materials.py # Material presets + stiffness parameter derivation β”‚ β”‚ β”‚ β”œβ”€β”€ renderer/ # Minimal β€” FOLD JSON export only (no images) β”‚ β”‚ β”œβ”€β”€ init.py β”‚ β”‚ └── exporter.py # FOLD JSON export, OBJ export β”‚ β”‚ β”‚ β”œβ”€β”€ models.py # OpenEnv types: OrigamiAction, OrigamiObservation, OrigamiState β”‚ β”œβ”€β”€ origami_environment.py # Environment class (subclasses openenv Environment) β”‚ β”œβ”€β”€ tasks.py # Task pool, curriculum levels, difficulty sampling β”‚ β”œβ”€β”€ app.py # create_app() + mount static viewer β”‚ β”œβ”€β”€ init.py β”‚ β”œβ”€β”€ requirements.txt # openenv-core, numpy, scipy, pydantic (NO matplotlib) β”‚ └── Dockerfile β”‚ β”œβ”€β”€ viewer/ # Three.js origami viewer (static files) β”‚ β”œβ”€β”€ index.html # Single page: 2D + 3D + metrics + controls β”‚ β”œβ”€β”€ origami-viewer.js # Three.js rendering from FOLD data β”‚ └── style.css # Layout styles β”‚ β”œβ”€β”€ client/ # OpenEnv client (for RL training) β”‚ β”œβ”€β”€ init.py β”‚ β”œβ”€β”€ client.py # OrigamiEnvClient (EnvClient subclass, WebSocket) β”‚ └── reward_functions.py # code_valid, no_cheating, fold_quality (GRPO rewards) β”‚ β”œβ”€β”€ training/ # GRPO training script (runs on Colab) β”‚ └── train_grpo.py # Launches server, runs GRPOTrainer (2048 pattern) β”‚ β”œβ”€β”€ openenv.yaml # Manifest for openenv push β”œβ”€β”€ pyproject.toml └── README.md


### What Changed from Previous Plan

| Before | After | Why |
|--------|-------|-----|
| `renderer/render_2d.py` (matplotlib) | REMOVED | Browser renders via Three.js |
| `renderer/render_3d.py` (matplotlib) | REMOVED | Browser renders via Three.js |
| `renderer/screenshots.py` (PNG capture) | REMOVED | No server-side images |
| `renderer/recorder.py` (GIF via imageio) | REMOVED | Browser can record (MediaRecorder) |
| `web/` (React + R3F) | `viewer/` (plain HTML + Three.js) | Simpler, no build step, same quality |
| `render_urls` in Observation | REMOVED | paper_state IS the render data |
| matplotlib, Pillow, imageio deps | REMOVED | Lighter Docker image |

---

## 3. Pydantic Models (`server/models.py`)

### OrigamiAction

```python
class OrigamiAction(Action):
    """One fold operation. Sent by the client each step."""
    # metadata: Dict[str, Any]  (inherited from Action)

    fold_type: str = "valley"             # "valley" | "mountain" | "pleat" | "crimp" | "stop"
    fold_line: Dict[str, List[float]]     # {"start": [x,y], "end": [x,y]} (normalized 0-1)
    fold_angle: float = 180.0             # degrees, 0-180
    layer_select: str = "all"             # "all" | "top" | "bottom"

OrigamiObservation

class OrigamiObservation(Observation):
    """Everything the LLM and Three.js viewer need."""
    # done: bool           (inherited from Observation)
    # reward: float|None   (inherited from Observation)
    # metadata: Dict       (inherited from Observation)

    task: Dict[str, Any]                  # Task definition
    paper_state: Dict[str, Any]           # FOLD-compatible geometry + physics
    # {
    #   "vertices_coords": [[x,y,z], ...],
    #   "edges_vertices": [[v1,v2], ...],
    #   "faces_vertices": [[v0,v1,v2,...], ...],
    #   "edges_assignment": ["M","V","B","F",...],
    #   "edges_foldAngle": [180, -180, 0, ...],
    #   "num_vertices": 36, "num_edges": 85, "num_faces": 50,
    #   "bounding_box": [0.5, 1.0, 0.02],
    #   "num_layers": 2,
    #   "width": 1.0, "height": 1.0,
    #   "material": {"name": "paper", ...},
    #   "strain_per_vertex": [0.001, 0.005, ...],
    #   "energy": {"total": 0.12, "bar": 0.05, "facet": 0.03, "fold": 0.04},
    #   "fold_count": 2,
    # }
    metrics: Dict[str, Any]               # All computed metrics
    fold_history: List[Dict[str, Any]]    # History of folds applied
    error: Optional[str] = None           # Error message if fold failed

Note: No render_urls. The paper_state dict contains all geometry data needed for Three.js to render. During training, reward functions read metrics. During demo, the viewer reads paper_state.vertices_coords etc.

OrigamiState

class OrigamiState(State):
    # episode_id: Optional[str]  (inherited from State)
    # step_count: int            (inherited from State)

    task_name: str = ""
    num_folds_applied: int = 0
    is_valid: bool = True
    total_reward: float = 0.0

Wire Format (what goes over WebSocket)

OpenEnv's serialize_observation() extracts done, reward, metadata to top level:

{
  "observation": {
    "task": {"name": "half_fold", "width": 1.0, ...},
    "paper_state": {
      "vertices_coords": [[0,0,0], [1,0,0], ...],
      "edges_vertices": [[0,1], [1,2], ...],
      "edges_assignment": ["B", "B", "V", ...],
      "strain_per_vertex": [0.001, 0.005, ...],
      ...
    },
    "metrics": {"compactness": 0.45, "is_valid": true, ...},
    "fold_history": [{"type": "valley", "step": 1, ...}],
    "error": null
  },
  "reward": null,
  "done": false
}

The Three.js viewer reads observation.paper_state β†’ builds mesh. The reward functions read observation.metrics β†’ compute score. Same data, different consumers.


4. Engine (server/engine/)

Unchanged from previous plan. All engine code is already implemented and working.

4.1 Paper State (paper.py)

FOLD-format compatible dataclass. Key fields: vertices_coords (N,3), edges_vertices (E,2), faces_vertices (ragged), edges_assignment, edges_foldAngle, rest_lengths, rest_positions, strain_per_vertex, energy, material, face_orders, num_layers.

Key methods: create_flat_sheet(), to_fold_json(), from_fold_json(), to_observation_dict(), bounding_box, triangulated_faces.

4.2 Fold Operations (fold.py)

10-step pipeline: validate β†’ split faces β†’ classify vertices β†’ quaternion rotation β†’ update assignments β†’ update angles β†’ update topology β†’ compute rest lengths β†’ update layers β†’ increment fold_count.

Pleat = valley + mountain. Crimp = mountain + valley.

4.3 Physics Solver (physics.py)

Bar-and-hinge Verlet integration. Three energy components:

  • E_bar (axial springs, prevents stretching)
  • E_facet (panel bending, keeps faces flat)
  • E_fold (crease rotation, drives folding)

Numerical stability: force clamping, NaN detection, stiffness caps, reduced dt=0.005, damping=0.15.

4.4 Validation (validation.py)

  • Kawasaki-Justin: alternating angle sum at interior vertices
  • Maekawa-Justin: |M - V| = 2 at interior vertices
  • Self-intersection: Z-separation + normal alignment check (not simple overlap)
  • Strain limits: per-vertex strain vs material.max_strain

4.5 Metrics (metrics.py)

20+ metrics: compactness, deployment_ratio, volume_compaction, packing_efficiency, fits_target_box, max/mean strain, energy breakdown, fold_count, folding_efficiency, crease_complexity, is_deployable, deployment_force_estimate, chamfer/hausdorff distance.

4.6 Materials (materials.py)

Four presets: paper, mylar, aluminum, nitinol. Each has thickness, Young's modulus, max strain, Poisson ratio, density. Derived stiffness properties for physics.


5. Renderer (server/renderer/)

What's LEFT (kept)

# exporter.py β€” FOLD JSON + OBJ export (lightweight, no image deps)

def save_fold_json(paper: PaperState, path: str, fold_history: list):
    """Export FOLD-format JSON with metadata."""

def export_obj(paper: PaperState) -> str:
    """Wavefront OBJ for external renderers / 3D printing."""

What's REMOVED

File Was Why Removed
render_2d.py matplotlib crease pattern PNG Three.js viewer renders 2D in browser
render_3d.py matplotlib 3D wireframe PNG Three.js viewer renders 3D in browser
screenshots.py Per-step PNG capture to disk No server-side images needed
recorder.py GIF assembly via imageio Browser can use MediaRecorder API

Dependencies REMOVED from requirements.txt

matplotlib>=3.7    ← REMOVED
imageio>=2.31      ← REMOVED
Pillow>=10.0       ← REMOVED

6. Three.js Viewer (viewer/)

Static HTML + JS served by FastAPI at /viewer. No build step. No React. Just one HTML file with embedded Three.js from CDN.

Architecture

Browser opens /viewer
    β”‚
    β”œβ”€β”€ Loads index.html (contains Three.js via CDN)
    β”‚
    β”œβ”€β”€ Connects to /ws (OpenEnv WebSocket)
    β”‚
    β”œβ”€β”€ Sends: {"type": "reset", "task_name": "solar_panel"}
    β”‚
    β”œβ”€β”€ Receives: observation with paper_state
    β”‚
    └── Three.js renders:
        β”œβ”€β”€ LEFT PANEL: 2D Crease Pattern (Canvas2D)
        β”‚   - edges colored by assignment (M=red, V=blue, B=black, F=gray)
        β”‚   - vertices as dots
        β”‚   - uses rest_positions[:, :2]
        β”‚
        β”œβ”€β”€ RIGHT PANEL: 3D Folded State (WebGL)
        β”‚   - BufferGeometry from vertices_coords + triangulated faces
        │   - Vertex colors from strain_per_vertex (blue→red gradient)
        β”‚   - OrbitControls (rotate, zoom, pan)
        β”‚   - DoubleSide material
        β”‚   - Edge overlay (M=red, V=blue, B=black lines)
        β”‚
        β”œβ”€β”€ BOTTOM: Metrics Dashboard
        β”‚   - Compactness, strain, fold count, validity, energy
        β”‚
        └── CONTROLS
            - Task selector dropdown
            - Fold input (type, line start/end, angle)
            - Reset / Step / Stop buttons
            - Animation slider (fold_percent 0β†’1)

Data Flow (Same FOLD Data, Browser Renders)

Server: env.step(action)
    β†’ paper_state = {
        vertices_coords: [[x,y,z], ...],     ← Three.js positions
        faces_vertices: [[0,1,2], ...],       ← Three.js index buffer
        edges_assignment: ["M","V","B",...],   ← Edge colors
        strain_per_vertex: [0.01, ...],       ← Vertex colors
        edges_foldAngle: [180, -180, ...],    ← Fold visualization
      }
    β†’ sent via WebSocket as JSON

Browser: viewer receives paper_state
    β†’ positions = new Float32Array(vertices_coords.flat())
    β†’ indices = new Uint16Array(triangulated_faces.flat())
    → colors = strainToColor(strain_per_vertex)  // blue→red
    β†’ geometry.setAttribute('position', ...)
    β†’ geometry.setIndex(...)
    β†’ geometry.setAttribute('color', ...)
    β†’ renderer.render(scene, camera)

Key Three.js Pattern (from origami simulator reference)

// Build mesh from FOLD data
function updateMesh(paperState) {
    const vertices = paperState.vertices_coords;
    const faces = paperState.faces_vertices;
    const strain = paperState.strain_per_vertex;

    // Triangulate faces (server already provides triangulated)
    const positions = new Float32Array(vertices.flat());
    const indices = [];
    for (const face of faces) {
        // Fan triangulation for polygons > 3 vertices
        for (let i = 1; i < face.length - 1; i++) {
            indices.push(face[0], face[i], face[i + 1]);
        }
    }

    // Strain β†’ vertex colors (blue=0, red=max)
    const colors = new Float32Array(vertices.length * 3);
    const maxStrain = Math.max(...strain, 0.001);
    for (let i = 0; i < vertices.length; i++) {
        const t = Math.min(strain[i] / maxStrain, 1.0);
        colors[i * 3]     = t;           // R
        colors[i * 3 + 1] = 0.2;         // G
        colors[i * 3 + 2] = 1.0 - t;     // B
    }

    geometry.setAttribute('position', new THREE.BufferAttribute(positions, 3));
    geometry.setIndex(indices);
    geometry.setAttribute('color', new THREE.Float32BufferAttribute(colors, 3));
    geometry.computeVertexNormals();
}

// Draw crease edges
function drawCreaseEdges(paperState) {
    const edgeColors = { M: 0xe74c3c, V: 0x3498db, B: 0x2c3e50 };
    for (let i = 0; i < paperState.edges_vertices.length; i++) {
        const [v1, v2] = paperState.edges_vertices[i];
        const assignment = paperState.edges_assignment[i];
        if (assignment in edgeColors) {
            // Draw line from vertices[v1] to vertices[v2]
        }
    }
}

7. Environment (server/origami_environment.py)

class OrigamiEnvironment(Environment[OrigamiAction, OrigamiObservation, OrigamiState]):
    SUPPORTS_CONCURRENT_SESSIONS = False

    def __init__(self, renders_dir: str = "renders", **kwargs):
        super().__init__(**kwargs)
        self._paper = None
        self._task = None
        self._fold_history = []
        self._metrics = {}
        self._error = None
        self._episode_id = None
        self._step_count = 0
        self._total_reward = 0.0

    def reset(self, seed=None, episode_id=None, **kwargs) -> OrigamiObservation:
        self._episode_id = episode_id or str(uuid.uuid4())
        self._step_count = 0
        self._fold_history = []
        self._error = None
        self._total_reward = 0.0

        # Sample task
        task_name = kwargs.get("task_name")
        self._task = get_task_by_name(task_name) or sample_task(seed=seed)

        # Create flat sheet
        self._paper = create_flat_sheet(
            self._task["width"], self._task["height"],
            MATERIALS[self._task["material"]]
        )

        # Initial validation + metrics
        self._validation = validate_state(self._paper)
        self._metrics = compute_all_metrics(self._paper, self._task, self._validation)

        return self._make_observation(done=False, reward=None)

    def step(self, action: OrigamiAction, timeout_s=None, **kwargs) -> OrigamiObservation:
        self._step_count += 1
        self._error = None

        if action.fold_type == "stop":
            return self._finalize_episode()

        # Apply fold β†’ physics β†’ validate β†’ metrics
        try:
            self._paper = apply_fold(self._paper, fold_dict)
            self._fold_history.append({**fold_dict, "step": self._step_count})
        except FoldError as e:
            self._error = str(e)
            return self._make_observation(done=True, reward=-5.0)

        self._paper = simulate(self._paper, fold_percent=1.0)
        self._validation = validate_state(self._paper)
        self._metrics = compute_all_metrics(self._paper, self._task, self._validation)

        done = self._step_count >= self._task.get("max_folds", 50)
        if done:
            return self._finalize_episode()

        return self._make_observation(done=False, reward=None)

    @property
    def state(self) -> OrigamiState:
        return OrigamiState(
            episode_id=self._episode_id,
            step_count=self._step_count,
            task_name=self._task.get("name", "") if self._task else "",
            num_folds_applied=len(self._fold_history),
            is_valid=self._metrics.get("is_valid", True),
            total_reward=self._total_reward,
        )

    def _make_observation(self, done, reward) -> OrigamiObservation:
        return OrigamiObservation(
            done=done,
            reward=reward,
            task=self._task or {},
            paper_state=self._paper.to_observation_dict() if self._paper else {},
            metrics=self._metrics,
            fold_history=self._fold_history,
            error=self._error,
        )

Key change: No render_urls, no capture_step(), no capture_episode_summary(). The observation contains paper_state (all geometry) and metrics (all numbers). That's all anyone needs.


8. App + Docker (server/app.py + server/Dockerfile)

app.py

"""FastAPI entry point β€” OpenEnv create_app() + static viewer."""
import os
from openenv.core.env_server.http_server import create_app
from .origami_environment import OrigamiEnvironment
from .models import OrigamiAction, OrigamiObservation

app = create_app(
    env=lambda: OrigamiEnvironment(),
    action_cls=OrigamiAction,
    observation_cls=OrigamiObservation,
    env_name="origami_env",
    max_concurrent_envs=1,
)

# Serve Three.js viewer as static files
from fastapi.staticfiles import StaticFiles
viewer_dir = os.path.join(os.path.dirname(__file__), "..", "viewer")
if os.path.isdir(viewer_dir):
    app.mount("/viewer", StaticFiles(directory=viewer_dir, html=True), name="viewer")

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Python dependencies (lightweight β€” no matplotlib/Pillow/imageio)
COPY server/requirements.txt ./server/
RUN pip install --no-cache-dir -r server/requirements.txt

# Copy server code
COPY server/ ./server/

# Copy Three.js viewer (static HTML/JS)
COPY viewer/ ./viewer/

WORKDIR /app

EXPOSE 8000
CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

When openenv push deploys this, it:

  1. Moves Dockerfile to repo root
  2. Injects ENV ENABLE_WEB_INTERFACE=true β†’ enables /web (generic OpenEnv UI)
  3. Our /viewer is the custom Three.js origami viewer

requirements.txt

openenv-core>=0.2.1
numpy>=1.24
scipy>=1.10
pydantic>=2.0
fastapi>=0.100
uvicorn>=0.22
websockets>=11.0

No matplotlib. No Pillow. No imageio. Docker image drops from ~500MB to ~200MB.

openenv.yaml

spec_version: 1
name: origami_env
type: space
runtime: fastapi
app: server.app:app
port: 8000

9. RL Training (training/train_grpo.py)

Follows the exact 2048 Unsloth pattern. Runs on Colab with GPU.

Flow

1. Launch OpenEnv server as local subprocess (port 9000)
2. Load LLM model (e.g., gpt-oss-20b with LoRA)
3. Define prompt: "Write a fold_strategy(paper_state) function..."
4. Define 3 reward functions:
   - code_valid: Does the code parse? (+1 / -2)
   - no_cheating: Only stdlib imports? (+1 / -20)
   - fold_quality: Does strategy produce good folds? (scored from metrics)
5. GRPOTrainer trains with these rewards
6. Each reward eval: reset env β†’ run strategy β†’ check metrics

The Prompt

prompt = """
Write a Python function that folds origami to maximize compactness.
You are given a paper_state dict with vertices, edges, and faces.
Return a fold dict or None to stop:

```python
def fold_strategy(paper_state):
    # paper_state has: vertices_coords, edges_vertices, edges_assignment,
    #   bounding_box, num_layers, material, strain_per_vertex, fold_count
    return {
        "type": "valley",  # or "mountain"
        "line": {"start": [x1, y1], "end": [x2, y2]},
        "angle": 180,
    }
    # Return None when done folding

Only output the short function fold_strategy. """.strip()


### Reward Functions

```python
def fold_quality(completions, **kwargs):
    """
    Execute strategy against live environment, score from metrics.
    +20.0  if compactness > 0.8 AND valid (optimal folding!)
    +5.0   if compactness > 0.5
    +2.0   if function runs but poor result
    -1.0   timeout
    -3.0   exception
     0     broken function
    """
    scores = []
    for completion in completions:
        function = extract_function(completion)
        if function is None:
            scores.append(0)
            continue
        try:
            strategy = create_locked_down_function(function)
            # Reset OpenEnv
            port, process = launch_openenv(port, process)
            result = process.reset()
            obs = result.observation

            # Run strategy loop (same as 2048)
            while not obs.done:
                fold = execute_with_time_limit(5)(strategy)(obs.paper_state)
                if fold is None:
                    action = OrigamiAction(fold_type="stop")
                else:
                    action = OrigamiAction(**fold)
                result = process.step(action)
                obs = result.observation

            # Score from final metrics
            m = obs.metrics
            if m.get("compactness", 0) > 0.8 and m.get("is_valid", False):
                scores.append(20.0)
            elif m.get("compactness", 0) > 0.5:
                scores.append(5.0)
            else:
                scores.append(2.0)
        except TimeoutError:
            scores.append(-1.0)
        except Exception:
            scores.append(-3.0)
    return scores

Key Difference from 2048

2048: LLM generates strategy(board) β†’ action_id (one action per call, game loops externally) Origami: LLM generates fold_strategy(paper_state) β†’ fold_dict | None (one fold per call, loops externally)

Same pattern. The reward function resets the env, loops strategy, scores the outcome.


10. Client (client/)

client.py

class OrigamiEnvClient(EnvClient[OrigamiAction, OrigamiObservation, OrigamiState]):

    def _step_payload(self, action: OrigamiAction) -> Dict[str, Any]:
        return action.model_dump()

    def _parse_result(self, payload: Dict[str, Any]) -> StepResult[OrigamiObservation]:
        return StepResult(
            observation=OrigamiObservation(**payload.get("observation", {})),
            reward=payload.get("reward"),
            done=payload.get("done", False),
        )

    def _parse_state(self, payload: Dict[str, Any]) -> OrigamiState:
        return OrigamiState(**payload)

reward_functions.py

Three reward functions for GRPO: code_valid, no_cheating, fold_quality. Run on Colab client side, NOT on the server.


11. Task System (server/tasks.py)

Curriculum (4 difficulty levels)

Level Task Material Target Ratio Max Folds Key Challenge
1 half_fold paper 0.50 3 Learn the format
1 quarter_fold paper 0.25 5 Two perpendicular folds
2 letter_fold paper 0.33 5 Tri-fold, parallel lines
2 map_fold paper 0.125 8 Grid fold, must deploy
3 solar_panel mylar 0.05 20 Miura-ori discovery, deployability
3 shelter_wall aluminum 0.10 15 Rigid material, strain limits
4 stent nitinol 0.09 25 Cylindrical target shape, superelastic

12. API Reference

Endpoints (auto-generated by create_app())

Endpoint Method Source Purpose
/health GET OpenEnv {"status": "healthy"}
/ws WebSocket OpenEnv Persistent session (reset/step/state)
/reset POST OpenEnv Stateless reset (creates new env per call)
/step POST OpenEnv Stateless step (creates new env per call)
/state GET OpenEnv Get current state
/schema GET OpenEnv Action + Observation JSON schemas
/metadata GET OpenEnv Environment name, description, version
/web GET OpenEnv Built-in generic web UI (when ENABLE_WEB_INTERFACE=true)
/viewer GET Custom Three.js origami viewer (static files)

Important: HTTP /reset and /step are stateless β€” each creates a fresh env. For multi-step episodes, use WebSocket /ws. This is OpenEnv's design.

WebSocket Message Format

// Client β†’ Server (reset)
{"type": "reset", "task_name": "solar_panel"}

// Client β†’ Server (step)
{
  "type": "step",
  "action": {
    "fold_type": "valley",
    "fold_line": {"start": [0, 0.5], "end": [1, 0.5]},
    "fold_angle": 180,
    "layer_select": "all"
  }
}

// Server β†’ Client (observation)
{
  "type": "observation",
  "data": {
    "observation": {
      "task": {"name": "solar_panel", ...},
      "paper_state": {
        "vertices_coords": [[0,0,0], [1,0,0], ...],
        "edges_vertices": [[0,1], ...],
        "edges_assignment": ["B", "V", ...],
        "strain_per_vertex": [0.001, ...],
        ...
      },
      "metrics": {"compactness": 0.45, ...},
      "fold_history": [...],
      "error": null
    },
    "reward": null,
    "done": false
  }
}

13. Deployment

Push to HF Spaces

cd origami_env/
huggingface-cli login
openenv push --repo-id <username>/origami-env

openenv push does:

  1. Validates openenv.yaml
  2. Moves server/Dockerfile β†’ root Dockerfile
  3. Injects ENV ENABLE_WEB_INTERFACE=true
  4. Adds HF Space frontmatter to README
  5. Uploads to HF Spaces (Docker SDK)

Or manually via Docker

docker build -t origami-env -f server/Dockerfile .
docker run -p 8000:8000 origami-env
curl http://localhost:8000/health
# Open http://localhost:8000/viewer in browser β†’ Three.js origami viewer

HF Space README header

---
title: Origami RL Environment
sdk: docker
app_port: 8000
base_path: /web
tags:
  - openenv
---

14. What's Already Implemented vs TODO

DONE (engine + server + client)

  • engine/paper.py β€” PaperState, create_flat_sheet, FOLD I/O
  • engine/fold.py β€” apply_fold with full 10-step pipeline
  • engine/physics.py β€” Bar-and-hinge Verlet solver (stabilized)
  • engine/validation.py β€” Kawasaki, Maekawa, self-intersection
  • engine/metrics.py β€” 20+ metrics computation
  • engine/materials.py β€” 4 material presets
  • models.py β€” OpenEnv Action/Observation/State subclasses
  • origami_environment.py β€” Environment subclass with reset/step/state
  • tasks.py β€” 7 tasks across 4 difficulty levels
  • app.py β€” create_app() integration
  • client/client.py β€” EnvClient subclass
  • client/reward_functions.py β€” GRPO reward functions
  • renderer/exporter.py β€” FOLD JSON + OBJ export
  • openenv.yaml, pyproject.toml, Dockerfile

TODO

  • Remove matplotlib rendering (render_2d, render_3d, screenshots, recorder)
  • Remove render_urls from OrigamiObservation
  • Remove matplotlib/Pillow/imageio from requirements.txt
  • Update origami_environment.py to not call capture_step()
  • Build Three.js viewer (viewer/index.html)
  • Update Dockerfile to copy viewer/
  • Write training/train_grpo.py (2048 pattern)
  • Test openenv validate (passes: "Ready for multi-mode deployment")

15. Training Grid Viewer β€” Live Spectator for RL Training

15.1 Concept

During GRPO training, the trainer generates G completions (strategies) per prompt. Each strategy runs a full episode (reset β†’ fold β†’ fold β†’ ... β†’ stop). The Training Grid Viewer shows ALL G episodes simultaneously as a live grid:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  EP 1    β”‚  EP 2    β”‚  EP 3    β”‚  EP 4    β”‚  G=4 strategies
β”‚  [3D]    β”‚  [3D]    β”‚  [3D]    β”‚  [3D]    β”‚  each cell = mini
β”‚  πŸ“„ fold β”‚  πŸ“„ fold β”‚  βœ… done β”‚  πŸ“„ fold β”‚  Three.js renderer
β”‚  step 2  β”‚  step 5  β”‚  r=20.0  β”‚  step 3  β”‚  + status badge
β”‚  c=0.31  β”‚  c=0.52  β”‚  c=0.85  β”‚  c=0.28  β”‚  + key metrics
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              ↓ click EP 2
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            EP 2 β€” FULLSCREEN               β”‚
β”‚   [2D crease]          [3D folded mesh]    β”‚
β”‚   Full metrics dashboard                   β”‚
β”‚   Fold history list                        β”‚
β”‚   [Back to Grid]                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Training ends β†’ grid clears β†’ switches to regular /viewer for demo

15.2 Why This Works (Not Computationally Complex)

Concern Reality
Server CPU Each episode = pure math (36 vertices, 50 faces). G=8 episodes ~0.8ms total
Network Each observation ~3KB JSON. G=8 Γ— 20 steps = ~480KB per prompt. Negligible
Browser GPU G=8 mini renderers Γ— 36 vertices = 288 vertices total. A game does 100K+
Browser memory 8 Three.js scenes β‰ˆ 10MB. Tab uses 200MB baseline. Trivial
WebGL contexts Browsers support 8-16 active contexts. G=8 fits. G>8 β†’ use single canvas with viewports

15.3 Architecture

COLAB (Training Process)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                      β”‚
β”‚  GRPOTrainer generates G completions per prompt      β”‚
β”‚       ↓                                              β”‚
β”‚  TrainingRunner (new)                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  For each completion (can run parallel):      β”‚   β”‚
β”‚  β”‚    env = OrigamiEnvironment()  ← in-process   β”‚   β”‚
β”‚  β”‚    obs = env.reset()                          β”‚   β”‚
β”‚  β”‚    broadcast(ep_id, obs) ───────────────────┐ β”‚   β”‚
β”‚  β”‚    while not done:                          β”‚ β”‚   β”‚
β”‚  β”‚      fold = strategy(paper_state)           β”‚ β”‚   β”‚
β”‚  β”‚      obs = env.step(action)                 β”‚ β”‚   β”‚
β”‚  β”‚      broadcast(ep_id, obs) ────────────────── β”‚   β”‚
β”‚  β”‚    score = compute_reward(obs.metrics)       β”‚ β”‚   β”‚
β”‚  β”‚    broadcast(ep_id, {done, score}) ────────── β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚       β”‚                                           β”‚   β”‚
β”‚       β–Ό                                           β”‚   β”‚
β”‚  TrainingBroadcastServer (new, same FastAPI)      β”‚   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”‚
β”‚  β”‚  /ws/training  ← viewers connect here      β”‚   β”‚   β”‚
β”‚  β”‚                                            β”‚   β”‚   β”‚
β”‚  β”‚  episode_registry: {                       β”‚   β”‚   β”‚
β”‚  β”‚    ep_id β†’ {status, obs, score, task}      β”‚   β”‚   β”‚
β”‚  β”‚  }                                         β”‚   β”‚   β”‚
β”‚  β”‚                                            β”‚β—€β”€β”€β”˜   β”‚
β”‚  β”‚  On update: broadcast to all viewers       β”‚       β”‚
β”‚  β”‚  On viewer connect: send full registry     β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚                                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ WebSocket
         β–Ό
BROWSER (Training Grid Viewer)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  /viewer/training.html                               β”‚
β”‚                                                      β”‚
β”‚  Connects to /ws/training                            β”‚
β”‚  Receives: {type, episode_id, observation, status}   β”‚
β”‚                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  CSS Grid: auto-fit, minmax(250px, 1fr)        β”‚  β”‚
β”‚  β”‚  Each cell:                                    β”‚  β”‚
β”‚  β”‚    - Mini Three.js scene (3D mesh only)        β”‚  β”‚
β”‚  β”‚    - Status badge: πŸ”„ running / βœ… done / ❌    β”‚  β”‚
β”‚  β”‚    - Key metrics: compactness, folds, reward   β”‚  β”‚
β”‚  β”‚    - Click β†’ fullscreen (same as /viewer)      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                      β”‚
β”‚  Header: training progress, batch #, avg reward      β”‚
β”‚  Auto-resize: G=4 β†’ 2Γ—2, G=8 β†’ 4Γ—2, G=16 β†’ 4Γ—4    β”‚
β”‚  Episode lifecycle:                                  β”‚
β”‚    new β†’ animate in (fade) β†’ running (blue border)   β”‚
β”‚    β†’ done+good (green) / done+bad (red)              β”‚
β”‚    β†’ next batch β†’ clear + new episodes               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

15.4 Key Design Decision: In-Process Envs, Not HTTP

During training, the reward function does NOT call the HTTP server. It instantiates OrigamiEnvironment() directly in Python β€” zero network overhead. The broadcast is one-way: training process β†’ viewers. Viewers are read-only spectators.

# Training runs G episodes in-process (fast)
env = OrigamiEnvironment()
obs = env.reset(task_name="half_fold")

# After each step, push observation to broadcast queue (async, non-blocking)
broadcast_queue.put_nowait({"episode_id": ep_id, "observation": obs.model_dump()})

# Separate asyncio task drains queue β†’ pushes to viewer WebSockets

This means:

  • Training speed is NOT affected by viewers (broadcast is fire-and-forget)
  • Viewers can connect/disconnect freely without impacting training
  • If no viewers are connected, observations are simply dropped (no queue buildup)

15.5 WebSocket Protocol: /ws/training

Viewer β†’ Server:
  (no messages β€” viewer is read-only spectator)

Server β†’ Viewer (on connect):
  {
    "type": "registry",
    "batch_id": 12,
    "episodes": {
      "ep_abc": {"status": "running", "task": "half_fold", "step": 3,
                 "observation": {...}, "metrics": {...}},
      "ep_def": {"status": "done", "task": "half_fold", "step": 5,
                 "observation": {...}, "metrics": {...}, "score": 20.0},
      ...
    }
  }

Server β†’ Viewer (on episode update):
  {
    "type": "episode_update",
    "episode_id": "ep_abc",
    "step": 4,
    "status": "running",
    "observation": {
      "paper_state": {...},   ← Three.js renders this
      "metrics": {...},
      "fold_history": [...]
    }
  }

Server β†’ Viewer (on episode complete):
  {
    "type": "episode_done",
    "episode_id": "ep_abc",
    "status": "success",       ← or "timeout", "error"
    "score": 20.0,
    "final_metrics": {...}
  }

Server β†’ Viewer (on new batch):
  {
    "type": "batch_start",
    "batch_id": 13,
    "num_episodes": 4,
    "prompt_index": 42
  }

Server β†’ Viewer (on batch complete):
  {
    "type": "batch_done",
    "batch_id": 13,
    "scores": [20.0, 5.0, 2.0, -1.0],
    "best_episode_id": "ep_xyz",
    "avg_score": 6.5
  }

Server β†’ Viewer (on training end):
  {
    "type": "training_done",
    "total_batches": 100,
    "best_score": 20.0
  }

15.6 New Files Needed

origami_env/
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py                      # UPDATE: mount training.html, add /ws/training
β”‚   └── training_broadcast.py       # NEW: TrainingBroadcastServer class
β”‚       - episode_registry: Dict[str, EpisodeInfo]
β”‚       - spectator_clients: List[WebSocket]
β”‚       - publish(episode_id, data) β†’ broadcast to all spectators
β”‚       - connect_spectator(ws) β†’ send registry snapshot
β”‚       - disconnect_spectator(ws)
β”‚       - clear_batch() β†’ reset registry for next batch
β”‚
β”œβ”€β”€ training/
β”‚   β”œβ”€β”€ train_grpo.py               # UPDATE: integrate TrainingRunner
β”‚   └── runner.py                   # NEW: parallel episode executor with broadcast
β”‚       - run_episode(strategy_fn, task, broadcast_fn) β†’ score
β”‚       - run_batch(strategies: List, broadcast_fn) β†’ scores
β”‚       - Uses ThreadPoolExecutor for G parallel episodes
β”‚       - Each step calls broadcast_fn(ep_id, obs)
β”‚
└── viewer/
    β”œβ”€β”€ index.html                  # UNCHANGED β€” single session demo viewer
    └── training.html               # NEW β€” training grid viewer
        - CSS grid layout (auto-fit columns)
        - Mini Three.js renderer per episode cell
        - Status badges + key metrics per cell
        - Click-to-fullscreen with [Back to Grid] button
        - Training progress header bar
        - Auto-clear on batch_start, auto-populate on episode_update

15.7 Changes to Existing Files

server/app.py β€” Add training broadcast endpoint:

from .training_broadcast import TrainingBroadcastServer

broadcast = TrainingBroadcastServer()

@app.websocket("/ws/training")
async def training_ws(websocket):
    await broadcast.connect_spectator(websocket)

# Mount training viewer
app.mount("/viewer", StaticFiles(directory=viewer_dir, html=True), name="viewer")
# training.html served at /viewer/training.html automatically

training/train_grpo.py β€” Add broadcast hook to fold_quality:

from origami_env.server.training_broadcast import TrainingBroadcastServer

# In fold_quality reward function:
# After each env.step(), call:
#   broadcast.publish(episode_id, obs)
# This is fire-and-forget; if no viewers, it's a no-op

15.8 Grid Viewer Rendering Strategy

For G ≀ 8: One WebGL context per cell

  • Each cell gets a small Three.js renderer (250Γ—200px)
  • 8 contexts Γ— 36 vertices = trivial
  • Orbit controls disabled in grid (too small), enabled in fullscreen

For G > 8 (unlikely but handled): Single canvas with viewport splitting

  • One large Three.js renderer
  • Use renderer.setScissor() + renderer.setViewport() per cell
  • Render each scene into its own region
  • More efficient, one WebGL context

Fullscreen transition:

Click cell β†’ CSS class "fullscreen" on that cell
  β†’ cell expands to 100vw Γ— 100vh (CSS transition)
  β†’ renderer.setSize(window.innerWidth, window.innerHeight)
  β†’ show 2D crease panel + full metrics (hidden in grid mode)
  β†’ show [Back to Grid] button
Click [Back to Grid] β†’ remove "fullscreen" class β†’ shrink back

15.9 Episode Cell Layout (Grid Mode)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ EP abc  πŸ”„ running      β”‚  ← header: ID + status badge
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚                     β”‚ β”‚
β”‚ β”‚   Mini 3D Mesh      β”‚ β”‚  ← Three.js renderer (~200px tall)
β”‚ β”‚   (strain colors)   β”‚ β”‚
β”‚ β”‚                     β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ step 3 β”‚ c=0.52 β”‚ βœ“     β”‚  ← footer: step count, compactness, valid
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

15.10 TODO

  • Create server/training_broadcast.py (broadcast server + episode registry)
  • Create training/runner.py (parallel episode executor with broadcast hooks)
  • Create viewer/training.html (grid viewer with mini Three.js renderers)
  • Update server/app.py (add /ws/training endpoint, mount training viewer)
  • Update training/train_grpo.py (integrate runner + broadcast)
  • Test: G=4 parallel episodes broadcasting to grid viewer
  • Test: fullscreen toggle on cell click
  • Test: batch transitions (clear β†’ new episodes β†’ populate)