Spaces:
Sleeping
# OpenEnv Environment Architecture β Origami RL
> Updated plan reflecting actual OpenEnv patterns (from 2048 reference),
> proper rendering strategy (Three.js viewer, not matplotlib),
> and clear separation between training vs demo contexts.
---
## 1. Overview
Two distinct contexts use the SAME server code:
CONTEXT 1: RL TRAINING (Colab/GPU machine) ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Colab Notebook β β β β ββββββββββββββββββββ ββββββββββββββββββββββββββββββ β β β GRPOTrainer β β OpenEnv Server (subprocess)β β β β β β port 9000 β β β β LLM generates β β β β β β fold_strategy() βββββΆβ reset() β step() β state β β β β ββββββ returns: paper_state + β β β β Reward functions β β metrics + reward β β β β score the result β β β β β ββββββββββββββββββββ β NO RENDERING β β β β Just geometry + numbers β β β ββββββββββββββββββββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CONTEXT 2: DEMO / HACKATHON (HF Space + Browser) ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β HF Space (Docker) β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β FastAPI via create_app() β β β β β β β β /ws β OpenEnv WebSocket protocol β β β β /reset, /step, /state β OpenEnv HTTP (stateless) β β β β /health, /schema β OpenEnv metadata β β β β /web β OpenEnv built-in generic UI β β β β /viewer β Three.js origami viewer β β β ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ β β β β β ββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ β β β OrigamiEnvironment β β β β reset() / step() / state β β β β β β β β ββββββββββββ ββββββββββββββββββββ β β β β β Engine β β Task System β β β β β β β β β β β β β β paper β β task pool β β β β β β fold β β materials β β β β β β physics β β curriculum β β β β β β validate β β β β β β β β metrics β β β β β β β ββββββββββββ ββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Three.js Viewer (static HTML/JS, served by same β β β β FastAPI at /viewer) β β β β β β β β Connects to /ws β receives paper_state β β β β Renders: 2D crease (Canvas) + 3D mesh (WebGL) β β β β + strain heatmap (vertex colors) + animation β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Browser opens /viewer β sees origami fold live in 3D
### Key Design Decisions
1. **No server-side image rendering.** No matplotlib, no Pillow, no imageio.
The server returns FOLD JSON data (vertices, edges, faces, strain) in every
observation. Rendering happens in the browser via Three.js (demo) or not at
all (training).
2. **`render_urls` removed from Observation.** The old design saved PNGs to disk
and returned file paths β nobody fetched them during training (wasted CPU/disk).
Instead, `paper_state` IS the render data. The Three.js viewer reads it directly.
3. **Same server code for both contexts.** The only difference is who consumes it:
- Training: Python reward functions read `paper_state.metrics` β compute reward
- Demo: Three.js reads `paper_state.vertices_coords` β renders 3D mesh
4. **`openenv push`** deploys to HF Spaces. Sets `ENABLE_WEB_INTERFACE=true`
which gives us the built-in generic UI at `/web`. Our custom Three.js viewer
is served as static files at `/viewer`.
---
## 2. Repository Structure
origami_env/ # THE deliverable β one package β βββ server/ # Python backend (OpenEnv server) β β β βββ engine/ # Origami simulation core β β βββ init.py β β βββ paper.py # PaperState dataclass, FOLD I/O, create_flat_sheet() β β βββ fold.py # apply_fold() β quaternion rotation, face splitting β β βββ physics.py # Bar-and-hinge Verlet solver, strain computation β β βββ validation.py # Kawasaki, Maekawa, self-intersection detection β β βββ metrics.py # ALL metrics β compactness, strain, shape, deployability β β βββ materials.py # Material presets + stiffness parameter derivation β β β βββ renderer/ # Minimal β FOLD JSON export only (no images) β β βββ init.py β β βββ exporter.py # FOLD JSON export, OBJ export β β β βββ models.py # OpenEnv types: OrigamiAction, OrigamiObservation, OrigamiState β βββ origami_environment.py # Environment class (subclasses openenv Environment) β βββ tasks.py # Task pool, curriculum levels, difficulty sampling β βββ app.py # create_app() + mount static viewer β βββ init.py β βββ requirements.txt # openenv-core, numpy, scipy, pydantic (NO matplotlib) β βββ Dockerfile β βββ viewer/ # Three.js origami viewer (static files) β βββ index.html # Single page: 2D + 3D + metrics + controls β βββ origami-viewer.js # Three.js rendering from FOLD data β βββ style.css # Layout styles β βββ client/ # OpenEnv client (for RL training) β βββ init.py β βββ client.py # OrigamiEnvClient (EnvClient subclass, WebSocket) β βββ reward_functions.py # code_valid, no_cheating, fold_quality (GRPO rewards) β βββ training/ # GRPO training script (runs on Colab) β βββ train_grpo.py # Launches server, runs GRPOTrainer (2048 pattern) β βββ openenv.yaml # Manifest for openenv push βββ pyproject.toml βββ README.md
### What Changed from Previous Plan
| Before | After | Why |
|--------|-------|-----|
| `renderer/render_2d.py` (matplotlib) | REMOVED | Browser renders via Three.js |
| `renderer/render_3d.py` (matplotlib) | REMOVED | Browser renders via Three.js |
| `renderer/screenshots.py` (PNG capture) | REMOVED | No server-side images |
| `renderer/recorder.py` (GIF via imageio) | REMOVED | Browser can record (MediaRecorder) |
| `web/` (React + R3F) | `viewer/` (plain HTML + Three.js) | Simpler, no build step, same quality |
| `render_urls` in Observation | REMOVED | paper_state IS the render data |
| matplotlib, Pillow, imageio deps | REMOVED | Lighter Docker image |
---
## 3. Pydantic Models (`server/models.py`)
### OrigamiAction
```python
class OrigamiAction(Action):
"""One fold operation. Sent by the client each step."""
# metadata: Dict[str, Any] (inherited from Action)
fold_type: str = "valley" # "valley" | "mountain" | "pleat" | "crimp" | "stop"
fold_line: Dict[str, List[float]] # {"start": [x,y], "end": [x,y]} (normalized 0-1)
fold_angle: float = 180.0 # degrees, 0-180
layer_select: str = "all" # "all" | "top" | "bottom"
OrigamiObservation
class OrigamiObservation(Observation):
"""Everything the LLM and Three.js viewer need."""
# done: bool (inherited from Observation)
# reward: float|None (inherited from Observation)
# metadata: Dict (inherited from Observation)
task: Dict[str, Any] # Task definition
paper_state: Dict[str, Any] # FOLD-compatible geometry + physics
# {
# "vertices_coords": [[x,y,z], ...],
# "edges_vertices": [[v1,v2], ...],
# "faces_vertices": [[v0,v1,v2,...], ...],
# "edges_assignment": ["M","V","B","F",...],
# "edges_foldAngle": [180, -180, 0, ...],
# "num_vertices": 36, "num_edges": 85, "num_faces": 50,
# "bounding_box": [0.5, 1.0, 0.02],
# "num_layers": 2,
# "width": 1.0, "height": 1.0,
# "material": {"name": "paper", ...},
# "strain_per_vertex": [0.001, 0.005, ...],
# "energy": {"total": 0.12, "bar": 0.05, "facet": 0.03, "fold": 0.04},
# "fold_count": 2,
# }
metrics: Dict[str, Any] # All computed metrics
fold_history: List[Dict[str, Any]] # History of folds applied
error: Optional[str] = None # Error message if fold failed
Note: No render_urls. The paper_state dict contains all geometry data
needed for Three.js to render. During training, reward functions read metrics.
During demo, the viewer reads paper_state.vertices_coords etc.
OrigamiState
class OrigamiState(State):
# episode_id: Optional[str] (inherited from State)
# step_count: int (inherited from State)
task_name: str = ""
num_folds_applied: int = 0
is_valid: bool = True
total_reward: float = 0.0
Wire Format (what goes over WebSocket)
OpenEnv's serialize_observation() extracts done, reward, metadata to top level:
{
"observation": {
"task": {"name": "half_fold", "width": 1.0, ...},
"paper_state": {
"vertices_coords": [[0,0,0], [1,0,0], ...],
"edges_vertices": [[0,1], [1,2], ...],
"edges_assignment": ["B", "B", "V", ...],
"strain_per_vertex": [0.001, 0.005, ...],
...
},
"metrics": {"compactness": 0.45, "is_valid": true, ...},
"fold_history": [{"type": "valley", "step": 1, ...}],
"error": null
},
"reward": null,
"done": false
}
The Three.js viewer reads observation.paper_state β builds mesh.
The reward functions read observation.metrics β compute score.
Same data, different consumers.
4. Engine (server/engine/)
Unchanged from previous plan. All engine code is already implemented and working.
4.1 Paper State (paper.py)
FOLD-format compatible dataclass. Key fields: vertices_coords (N,3),
edges_vertices (E,2), faces_vertices (ragged), edges_assignment,
edges_foldAngle, rest_lengths, rest_positions, strain_per_vertex,
energy, material, face_orders, num_layers.
Key methods: create_flat_sheet(), to_fold_json(), from_fold_json(),
to_observation_dict(), bounding_box, triangulated_faces.
4.2 Fold Operations (fold.py)
10-step pipeline: validate β split faces β classify vertices β quaternion rotation β update assignments β update angles β update topology β compute rest lengths β update layers β increment fold_count.
Pleat = valley + mountain. Crimp = mountain + valley.
4.3 Physics Solver (physics.py)
Bar-and-hinge Verlet integration. Three energy components:
- E_bar (axial springs, prevents stretching)
- E_facet (panel bending, keeps faces flat)
- E_fold (crease rotation, drives folding)
Numerical stability: force clamping, NaN detection, stiffness caps, reduced dt=0.005, damping=0.15.
4.4 Validation (validation.py)
- Kawasaki-Justin: alternating angle sum at interior vertices
- Maekawa-Justin: |M - V| = 2 at interior vertices
- Self-intersection: Z-separation + normal alignment check (not simple overlap)
- Strain limits: per-vertex strain vs material.max_strain
4.5 Metrics (metrics.py)
20+ metrics: compactness, deployment_ratio, volume_compaction, packing_efficiency, fits_target_box, max/mean strain, energy breakdown, fold_count, folding_efficiency, crease_complexity, is_deployable, deployment_force_estimate, chamfer/hausdorff distance.
4.6 Materials (materials.py)
Four presets: paper, mylar, aluminum, nitinol. Each has thickness, Young's modulus, max strain, Poisson ratio, density. Derived stiffness properties for physics.
5. Renderer (server/renderer/)
What's LEFT (kept)
# exporter.py β FOLD JSON + OBJ export (lightweight, no image deps)
def save_fold_json(paper: PaperState, path: str, fold_history: list):
"""Export FOLD-format JSON with metadata."""
def export_obj(paper: PaperState) -> str:
"""Wavefront OBJ for external renderers / 3D printing."""
What's REMOVED
| File | Was | Why Removed |
|---|---|---|
render_2d.py |
matplotlib crease pattern PNG | Three.js viewer renders 2D in browser |
render_3d.py |
matplotlib 3D wireframe PNG | Three.js viewer renders 3D in browser |
screenshots.py |
Per-step PNG capture to disk | No server-side images needed |
recorder.py |
GIF assembly via imageio | Browser can use MediaRecorder API |
Dependencies REMOVED from requirements.txt
matplotlib>=3.7 β REMOVED
imageio>=2.31 β REMOVED
Pillow>=10.0 β REMOVED
6. Three.js Viewer (viewer/)
Static HTML + JS served by FastAPI at /viewer. No build step. No React.
Just one HTML file with embedded Three.js from CDN.
Architecture
Browser opens /viewer
β
βββ Loads index.html (contains Three.js via CDN)
β
βββ Connects to /ws (OpenEnv WebSocket)
β
βββ Sends: {"type": "reset", "task_name": "solar_panel"}
β
βββ Receives: observation with paper_state
β
βββ Three.js renders:
βββ LEFT PANEL: 2D Crease Pattern (Canvas2D)
β - edges colored by assignment (M=red, V=blue, B=black, F=gray)
β - vertices as dots
β - uses rest_positions[:, :2]
β
βββ RIGHT PANEL: 3D Folded State (WebGL)
β - BufferGeometry from vertices_coords + triangulated faces
β - Vertex colors from strain_per_vertex (blueβred gradient)
β - OrbitControls (rotate, zoom, pan)
β - DoubleSide material
β - Edge overlay (M=red, V=blue, B=black lines)
β
βββ BOTTOM: Metrics Dashboard
β - Compactness, strain, fold count, validity, energy
β
βββ CONTROLS
- Task selector dropdown
- Fold input (type, line start/end, angle)
- Reset / Step / Stop buttons
- Animation slider (fold_percent 0β1)
Data Flow (Same FOLD Data, Browser Renders)
Server: env.step(action)
β paper_state = {
vertices_coords: [[x,y,z], ...], β Three.js positions
faces_vertices: [[0,1,2], ...], β Three.js index buffer
edges_assignment: ["M","V","B",...], β Edge colors
strain_per_vertex: [0.01, ...], β Vertex colors
edges_foldAngle: [180, -180, ...], β Fold visualization
}
β sent via WebSocket as JSON
Browser: viewer receives paper_state
β positions = new Float32Array(vertices_coords.flat())
β indices = new Uint16Array(triangulated_faces.flat())
β colors = strainToColor(strain_per_vertex) // blueβred
β geometry.setAttribute('position', ...)
β geometry.setIndex(...)
β geometry.setAttribute('color', ...)
β renderer.render(scene, camera)
Key Three.js Pattern (from origami simulator reference)
// Build mesh from FOLD data
function updateMesh(paperState) {
const vertices = paperState.vertices_coords;
const faces = paperState.faces_vertices;
const strain = paperState.strain_per_vertex;
// Triangulate faces (server already provides triangulated)
const positions = new Float32Array(vertices.flat());
const indices = [];
for (const face of faces) {
// Fan triangulation for polygons > 3 vertices
for (let i = 1; i < face.length - 1; i++) {
indices.push(face[0], face[i], face[i + 1]);
}
}
// Strain β vertex colors (blue=0, red=max)
const colors = new Float32Array(vertices.length * 3);
const maxStrain = Math.max(...strain, 0.001);
for (let i = 0; i < vertices.length; i++) {
const t = Math.min(strain[i] / maxStrain, 1.0);
colors[i * 3] = t; // R
colors[i * 3 + 1] = 0.2; // G
colors[i * 3 + 2] = 1.0 - t; // B
}
geometry.setAttribute('position', new THREE.BufferAttribute(positions, 3));
geometry.setIndex(indices);
geometry.setAttribute('color', new THREE.Float32BufferAttribute(colors, 3));
geometry.computeVertexNormals();
}
// Draw crease edges
function drawCreaseEdges(paperState) {
const edgeColors = { M: 0xe74c3c, V: 0x3498db, B: 0x2c3e50 };
for (let i = 0; i < paperState.edges_vertices.length; i++) {
const [v1, v2] = paperState.edges_vertices[i];
const assignment = paperState.edges_assignment[i];
if (assignment in edgeColors) {
// Draw line from vertices[v1] to vertices[v2]
}
}
}
7. Environment (server/origami_environment.py)
class OrigamiEnvironment(Environment[OrigamiAction, OrigamiObservation, OrigamiState]):
SUPPORTS_CONCURRENT_SESSIONS = False
def __init__(self, renders_dir: str = "renders", **kwargs):
super().__init__(**kwargs)
self._paper = None
self._task = None
self._fold_history = []
self._metrics = {}
self._error = None
self._episode_id = None
self._step_count = 0
self._total_reward = 0.0
def reset(self, seed=None, episode_id=None, **kwargs) -> OrigamiObservation:
self._episode_id = episode_id or str(uuid.uuid4())
self._step_count = 0
self._fold_history = []
self._error = None
self._total_reward = 0.0
# Sample task
task_name = kwargs.get("task_name")
self._task = get_task_by_name(task_name) or sample_task(seed=seed)
# Create flat sheet
self._paper = create_flat_sheet(
self._task["width"], self._task["height"],
MATERIALS[self._task["material"]]
)
# Initial validation + metrics
self._validation = validate_state(self._paper)
self._metrics = compute_all_metrics(self._paper, self._task, self._validation)
return self._make_observation(done=False, reward=None)
def step(self, action: OrigamiAction, timeout_s=None, **kwargs) -> OrigamiObservation:
self._step_count += 1
self._error = None
if action.fold_type == "stop":
return self._finalize_episode()
# Apply fold β physics β validate β metrics
try:
self._paper = apply_fold(self._paper, fold_dict)
self._fold_history.append({**fold_dict, "step": self._step_count})
except FoldError as e:
self._error = str(e)
return self._make_observation(done=True, reward=-5.0)
self._paper = simulate(self._paper, fold_percent=1.0)
self._validation = validate_state(self._paper)
self._metrics = compute_all_metrics(self._paper, self._task, self._validation)
done = self._step_count >= self._task.get("max_folds", 50)
if done:
return self._finalize_episode()
return self._make_observation(done=False, reward=None)
@property
def state(self) -> OrigamiState:
return OrigamiState(
episode_id=self._episode_id,
step_count=self._step_count,
task_name=self._task.get("name", "") if self._task else "",
num_folds_applied=len(self._fold_history),
is_valid=self._metrics.get("is_valid", True),
total_reward=self._total_reward,
)
def _make_observation(self, done, reward) -> OrigamiObservation:
return OrigamiObservation(
done=done,
reward=reward,
task=self._task or {},
paper_state=self._paper.to_observation_dict() if self._paper else {},
metrics=self._metrics,
fold_history=self._fold_history,
error=self._error,
)
Key change: No render_urls, no capture_step(), no capture_episode_summary().
The observation contains paper_state (all geometry) and metrics (all numbers).
That's all anyone needs.
8. App + Docker (server/app.py + server/Dockerfile)
app.py
"""FastAPI entry point β OpenEnv create_app() + static viewer."""
import os
from openenv.core.env_server.http_server import create_app
from .origami_environment import OrigamiEnvironment
from .models import OrigamiAction, OrigamiObservation
app = create_app(
env=lambda: OrigamiEnvironment(),
action_cls=OrigamiAction,
observation_cls=OrigamiObservation,
env_name="origami_env",
max_concurrent_envs=1,
)
# Serve Three.js viewer as static files
from fastapi.staticfiles import StaticFiles
viewer_dir = os.path.join(os.path.dirname(__file__), "..", "viewer")
if os.path.isdir(viewer_dir):
app.mount("/viewer", StaticFiles(directory=viewer_dir, html=True), name="viewer")
Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Python dependencies (lightweight β no matplotlib/Pillow/imageio)
COPY server/requirements.txt ./server/
RUN pip install --no-cache-dir -r server/requirements.txt
# Copy server code
COPY server/ ./server/
# Copy Three.js viewer (static HTML/JS)
COPY viewer/ ./viewer/
WORKDIR /app
EXPOSE 8000
CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
When openenv push deploys this, it:
- Moves Dockerfile to repo root
- Injects
ENV ENABLE_WEB_INTERFACE=trueβ enables/web(generic OpenEnv UI) - Our
/vieweris the custom Three.js origami viewer
requirements.txt
openenv-core>=0.2.1
numpy>=1.24
scipy>=1.10
pydantic>=2.0
fastapi>=0.100
uvicorn>=0.22
websockets>=11.0
No matplotlib. No Pillow. No imageio. Docker image drops from ~500MB to ~200MB.
openenv.yaml
spec_version: 1
name: origami_env
type: space
runtime: fastapi
app: server.app:app
port: 8000
9. RL Training (training/train_grpo.py)
Follows the exact 2048 Unsloth pattern. Runs on Colab with GPU.
Flow
1. Launch OpenEnv server as local subprocess (port 9000)
2. Load LLM model (e.g., gpt-oss-20b with LoRA)
3. Define prompt: "Write a fold_strategy(paper_state) function..."
4. Define 3 reward functions:
- code_valid: Does the code parse? (+1 / -2)
- no_cheating: Only stdlib imports? (+1 / -20)
- fold_quality: Does strategy produce good folds? (scored from metrics)
5. GRPOTrainer trains with these rewards
6. Each reward eval: reset env β run strategy β check metrics
The Prompt
prompt = """
Write a Python function that folds origami to maximize compactness.
You are given a paper_state dict with vertices, edges, and faces.
Return a fold dict or None to stop:
```python
def fold_strategy(paper_state):
# paper_state has: vertices_coords, edges_vertices, edges_assignment,
# bounding_box, num_layers, material, strain_per_vertex, fold_count
return {
"type": "valley", # or "mountain"
"line": {"start": [x1, y1], "end": [x2, y2]},
"angle": 180,
}
# Return None when done folding
Only output the short function fold_strategy.
""".strip()
### Reward Functions
```python
def fold_quality(completions, **kwargs):
"""
Execute strategy against live environment, score from metrics.
+20.0 if compactness > 0.8 AND valid (optimal folding!)
+5.0 if compactness > 0.5
+2.0 if function runs but poor result
-1.0 timeout
-3.0 exception
0 broken function
"""
scores = []
for completion in completions:
function = extract_function(completion)
if function is None:
scores.append(0)
continue
try:
strategy = create_locked_down_function(function)
# Reset OpenEnv
port, process = launch_openenv(port, process)
result = process.reset()
obs = result.observation
# Run strategy loop (same as 2048)
while not obs.done:
fold = execute_with_time_limit(5)(strategy)(obs.paper_state)
if fold is None:
action = OrigamiAction(fold_type="stop")
else:
action = OrigamiAction(**fold)
result = process.step(action)
obs = result.observation
# Score from final metrics
m = obs.metrics
if m.get("compactness", 0) > 0.8 and m.get("is_valid", False):
scores.append(20.0)
elif m.get("compactness", 0) > 0.5:
scores.append(5.0)
else:
scores.append(2.0)
except TimeoutError:
scores.append(-1.0)
except Exception:
scores.append(-3.0)
return scores
Key Difference from 2048
2048: LLM generates strategy(board) β action_id (one action per call, game loops externally)
Origami: LLM generates fold_strategy(paper_state) β fold_dict | None (one fold per call, loops externally)
Same pattern. The reward function resets the env, loops strategy, scores the outcome.
10. Client (client/)
client.py
class OrigamiEnvClient(EnvClient[OrigamiAction, OrigamiObservation, OrigamiState]):
def _step_payload(self, action: OrigamiAction) -> Dict[str, Any]:
return action.model_dump()
def _parse_result(self, payload: Dict[str, Any]) -> StepResult[OrigamiObservation]:
return StepResult(
observation=OrigamiObservation(**payload.get("observation", {})),
reward=payload.get("reward"),
done=payload.get("done", False),
)
def _parse_state(self, payload: Dict[str, Any]) -> OrigamiState:
return OrigamiState(**payload)
reward_functions.py
Three reward functions for GRPO: code_valid, no_cheating, fold_quality.
Run on Colab client side, NOT on the server.
11. Task System (server/tasks.py)
Curriculum (4 difficulty levels)
| Level | Task | Material | Target Ratio | Max Folds | Key Challenge |
|---|---|---|---|---|---|
| 1 | half_fold | paper | 0.50 | 3 | Learn the format |
| 1 | quarter_fold | paper | 0.25 | 5 | Two perpendicular folds |
| 2 | letter_fold | paper | 0.33 | 5 | Tri-fold, parallel lines |
| 2 | map_fold | paper | 0.125 | 8 | Grid fold, must deploy |
| 3 | solar_panel | mylar | 0.05 | 20 | Miura-ori discovery, deployability |
| 3 | shelter_wall | aluminum | 0.10 | 15 | Rigid material, strain limits |
| 4 | stent | nitinol | 0.09 | 25 | Cylindrical target shape, superelastic |
12. API Reference
Endpoints (auto-generated by create_app())
| Endpoint | Method | Source | Purpose |
|---|---|---|---|
/health |
GET | OpenEnv | {"status": "healthy"} |
/ws |
WebSocket | OpenEnv | Persistent session (reset/step/state) |
/reset |
POST | OpenEnv | Stateless reset (creates new env per call) |
/step |
POST | OpenEnv | Stateless step (creates new env per call) |
/state |
GET | OpenEnv | Get current state |
/schema |
GET | OpenEnv | Action + Observation JSON schemas |
/metadata |
GET | OpenEnv | Environment name, description, version |
/web |
GET | OpenEnv | Built-in generic web UI (when ENABLE_WEB_INTERFACE=true) |
/viewer |
GET | Custom | Three.js origami viewer (static files) |
Important: HTTP /reset and /step are stateless β each creates a fresh env.
For multi-step episodes, use WebSocket /ws. This is OpenEnv's design.
WebSocket Message Format
// Client β Server (reset)
{"type": "reset", "task_name": "solar_panel"}
// Client β Server (step)
{
"type": "step",
"action": {
"fold_type": "valley",
"fold_line": {"start": [0, 0.5], "end": [1, 0.5]},
"fold_angle": 180,
"layer_select": "all"
}
}
// Server β Client (observation)
{
"type": "observation",
"data": {
"observation": {
"task": {"name": "solar_panel", ...},
"paper_state": {
"vertices_coords": [[0,0,0], [1,0,0], ...],
"edges_vertices": [[0,1], ...],
"edges_assignment": ["B", "V", ...],
"strain_per_vertex": [0.001, ...],
...
},
"metrics": {"compactness": 0.45, ...},
"fold_history": [...],
"error": null
},
"reward": null,
"done": false
}
}
13. Deployment
Push to HF Spaces
cd origami_env/
huggingface-cli login
openenv push --repo-id <username>/origami-env
openenv push does:
- Validates
openenv.yaml - Moves
server/Dockerfileβ rootDockerfile - Injects
ENV ENABLE_WEB_INTERFACE=true - Adds HF Space frontmatter to README
- Uploads to HF Spaces (Docker SDK)
Or manually via Docker
docker build -t origami-env -f server/Dockerfile .
docker run -p 8000:8000 origami-env
curl http://localhost:8000/health
# Open http://localhost:8000/viewer in browser β Three.js origami viewer
HF Space README header
---
title: Origami RL Environment
sdk: docker
app_port: 8000
base_path: /web
tags:
- openenv
---
14. What's Already Implemented vs TODO
DONE (engine + server + client)
-
engine/paper.pyβ PaperState, create_flat_sheet, FOLD I/O -
engine/fold.pyβ apply_fold with full 10-step pipeline -
engine/physics.pyβ Bar-and-hinge Verlet solver (stabilized) -
engine/validation.pyβ Kawasaki, Maekawa, self-intersection -
engine/metrics.pyβ 20+ metrics computation -
engine/materials.pyβ 4 material presets -
models.pyβ OpenEnv Action/Observation/State subclasses -
origami_environment.pyβ Environment subclass with reset/step/state -
tasks.pyβ 7 tasks across 4 difficulty levels -
app.pyβ create_app() integration -
client/client.pyβ EnvClient subclass -
client/reward_functions.pyβ GRPO reward functions -
renderer/exporter.pyβ FOLD JSON + OBJ export -
openenv.yaml,pyproject.toml,Dockerfile
TODO
- Remove matplotlib rendering (render_2d, render_3d, screenshots, recorder)
- Remove render_urls from OrigamiObservation
- Remove matplotlib/Pillow/imageio from requirements.txt
- Update origami_environment.py to not call capture_step()
- Build Three.js viewer (
viewer/index.html) - Update Dockerfile to copy viewer/
- Write
training/train_grpo.py(2048 pattern) - Test openenv validate (passes: "Ready for multi-mode deployment")
15. Training Grid Viewer β Live Spectator for RL Training
15.1 Concept
During GRPO training, the trainer generates G completions (strategies) per prompt. Each strategy runs a full episode (reset β fold β fold β ... β stop). The Training Grid Viewer shows ALL G episodes simultaneously as a live grid:
ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ
β EP 1 β EP 2 β EP 3 β EP 4 β G=4 strategies
β [3D] β [3D] β [3D] β [3D] β each cell = mini
β π fold β π fold β β
done β π fold β Three.js renderer
β step 2 β step 5 β r=20.0 β step 3 β + status badge
β c=0.31 β c=0.52 β c=0.85 β c=0.28 β + key metrics
ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ
β click EP 2
ββββββββββββββββββββββββββββββββββββββββββββββ
β EP 2 β FULLSCREEN β
β [2D crease] [3D folded mesh] β
β Full metrics dashboard β
β Fold history list β
β [Back to Grid] β
ββββββββββββββββββββββββββββββββββββββββββββββ
Training ends β grid clears β switches to regular /viewer for demo
15.2 Why This Works (Not Computationally Complex)
| Concern | Reality |
|---|---|
| Server CPU | Each episode = pure math (36 vertices, 50 faces). G=8 episodes ~0.8ms total |
| Network | Each observation ~3KB JSON. G=8 Γ 20 steps = ~480KB per prompt. Negligible |
| Browser GPU | G=8 mini renderers Γ 36 vertices = 288 vertices total. A game does 100K+ |
| Browser memory | 8 Three.js scenes β 10MB. Tab uses 200MB baseline. Trivial |
| WebGL contexts | Browsers support 8-16 active contexts. G=8 fits. G>8 β use single canvas with viewports |
15.3 Architecture
COLAB (Training Process)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β GRPOTrainer generates G completions per prompt β
β β β
β TrainingRunner (new) β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β For each completion (can run parallel): β β
β β env = OrigamiEnvironment() β in-process β β
β β obs = env.reset() β β
β β broadcast(ep_id, obs) ββββββββββββββββββββ β β
β β while not done: β β β
β β fold = strategy(paper_state) β β β
β β obs = env.step(action) β β β
β β broadcast(ep_id, obs) ββββββββββββββββββ€ β β
β β score = compute_reward(obs.metrics) β β β
β β broadcast(ep_id, {done, score}) ββββββββββ€ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β βΌ β β
β TrainingBroadcastServer (new, same FastAPI) β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β β
β β /ws/training β viewers connect here β β β
β β β β β
β β episode_registry: { β β β
β β ep_id β {status, obs, score, task} β β β
β β } β β β
β β βββββ β
β β On update: broadcast to all viewers β β
β β On viewer connect: send full registry β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WebSocket
βΌ
BROWSER (Training Grid Viewer)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β /viewer/training.html β
β β
β Connects to /ws/training β
β Receives: {type, episode_id, observation, status} β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CSS Grid: auto-fit, minmax(250px, 1fr) β β
β β Each cell: β β
β β - Mini Three.js scene (3D mesh only) β β
β β - Status badge: π running / β
done / β β β
β β - Key metrics: compactness, folds, reward β β
β β - Click β fullscreen (same as /viewer) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Header: training progress, batch #, avg reward β
β Auto-resize: G=4 β 2Γ2, G=8 β 4Γ2, G=16 β 4Γ4 β
β Episode lifecycle: β
β new β animate in (fade) β running (blue border) β
β β done+good (green) / done+bad (red) β
β β next batch β clear + new episodes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
15.4 Key Design Decision: In-Process Envs, Not HTTP
During training, the reward function does NOT call the HTTP server.
It instantiates OrigamiEnvironment() directly in Python β zero network overhead.
The broadcast is one-way: training process β viewers. Viewers are read-only spectators.
# Training runs G episodes in-process (fast)
env = OrigamiEnvironment()
obs = env.reset(task_name="half_fold")
# After each step, push observation to broadcast queue (async, non-blocking)
broadcast_queue.put_nowait({"episode_id": ep_id, "observation": obs.model_dump()})
# Separate asyncio task drains queue β pushes to viewer WebSockets
This means:
- Training speed is NOT affected by viewers (broadcast is fire-and-forget)
- Viewers can connect/disconnect freely without impacting training
- If no viewers are connected, observations are simply dropped (no queue buildup)
15.5 WebSocket Protocol: /ws/training
Viewer β Server:
(no messages β viewer is read-only spectator)
Server β Viewer (on connect):
{
"type": "registry",
"batch_id": 12,
"episodes": {
"ep_abc": {"status": "running", "task": "half_fold", "step": 3,
"observation": {...}, "metrics": {...}},
"ep_def": {"status": "done", "task": "half_fold", "step": 5,
"observation": {...}, "metrics": {...}, "score": 20.0},
...
}
}
Server β Viewer (on episode update):
{
"type": "episode_update",
"episode_id": "ep_abc",
"step": 4,
"status": "running",
"observation": {
"paper_state": {...}, β Three.js renders this
"metrics": {...},
"fold_history": [...]
}
}
Server β Viewer (on episode complete):
{
"type": "episode_done",
"episode_id": "ep_abc",
"status": "success", β or "timeout", "error"
"score": 20.0,
"final_metrics": {...}
}
Server β Viewer (on new batch):
{
"type": "batch_start",
"batch_id": 13,
"num_episodes": 4,
"prompt_index": 42
}
Server β Viewer (on batch complete):
{
"type": "batch_done",
"batch_id": 13,
"scores": [20.0, 5.0, 2.0, -1.0],
"best_episode_id": "ep_xyz",
"avg_score": 6.5
}
Server β Viewer (on training end):
{
"type": "training_done",
"total_batches": 100,
"best_score": 20.0
}
15.6 New Files Needed
origami_env/
βββ server/
β βββ app.py # UPDATE: mount training.html, add /ws/training
β βββ training_broadcast.py # NEW: TrainingBroadcastServer class
β - episode_registry: Dict[str, EpisodeInfo]
β - spectator_clients: List[WebSocket]
β - publish(episode_id, data) β broadcast to all spectators
β - connect_spectator(ws) β send registry snapshot
β - disconnect_spectator(ws)
β - clear_batch() β reset registry for next batch
β
βββ training/
β βββ train_grpo.py # UPDATE: integrate TrainingRunner
β βββ runner.py # NEW: parallel episode executor with broadcast
β - run_episode(strategy_fn, task, broadcast_fn) β score
β - run_batch(strategies: List, broadcast_fn) β scores
β - Uses ThreadPoolExecutor for G parallel episodes
β - Each step calls broadcast_fn(ep_id, obs)
β
βββ viewer/
βββ index.html # UNCHANGED β single session demo viewer
βββ training.html # NEW β training grid viewer
- CSS grid layout (auto-fit columns)
- Mini Three.js renderer per episode cell
- Status badges + key metrics per cell
- Click-to-fullscreen with [Back to Grid] button
- Training progress header bar
- Auto-clear on batch_start, auto-populate on episode_update
15.7 Changes to Existing Files
server/app.py β Add training broadcast endpoint:
from .training_broadcast import TrainingBroadcastServer
broadcast = TrainingBroadcastServer()
@app.websocket("/ws/training")
async def training_ws(websocket):
await broadcast.connect_spectator(websocket)
# Mount training viewer
app.mount("/viewer", StaticFiles(directory=viewer_dir, html=True), name="viewer")
# training.html served at /viewer/training.html automatically
training/train_grpo.py β Add broadcast hook to fold_quality:
from origami_env.server.training_broadcast import TrainingBroadcastServer
# In fold_quality reward function:
# After each env.step(), call:
# broadcast.publish(episode_id, obs)
# This is fire-and-forget; if no viewers, it's a no-op
15.8 Grid Viewer Rendering Strategy
For G β€ 8: One WebGL context per cell
- Each cell gets a small Three.js renderer (250Γ200px)
- 8 contexts Γ 36 vertices = trivial
- Orbit controls disabled in grid (too small), enabled in fullscreen
For G > 8 (unlikely but handled): Single canvas with viewport splitting
- One large Three.js renderer
- Use
renderer.setScissor()+renderer.setViewport()per cell - Render each scene into its own region
- More efficient, one WebGL context
Fullscreen transition:
Click cell β CSS class "fullscreen" on that cell
β cell expands to 100vw Γ 100vh (CSS transition)
β renderer.setSize(window.innerWidth, window.innerHeight)
β show 2D crease panel + full metrics (hidden in grid mode)
β show [Back to Grid] button
Click [Back to Grid] β remove "fullscreen" class β shrink back
15.9 Episode Cell Layout (Grid Mode)
βββββββββββββββββββββββββββ
β EP abc π running β β header: ID + status badge
β βββββββββββββββββββββββ β
β β β β
β β Mini 3D Mesh β β β Three.js renderer (~200px tall)
β β (strain colors) β β
β β β β
β βββββββββββββββββββββββ β
β step 3 β c=0.52 β β β β footer: step count, compactness, valid
βββββββββββββββββββββββββββ
15.10 TODO
- Create
server/training_broadcast.py(broadcast server + episode registry) - Create
training/runner.py(parallel episode executor with broadcast hooks) - Create
viewer/training.html(grid viewer with mini Three.js renderers) - Update
server/app.py(add/ws/trainingendpoint, mount training viewer) - Update
training/train_grpo.py(integrate runner + broadcast) - Test: G=4 parallel episodes broadcasting to grid viewer
- Test: fullscreen toggle on cell click
- Test: batch transitions (clear β new episodes β populate)