Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

App Files Files Community

CreativeEngineer commited on Mar 7

Commit

65b799e

0 Parent(s):

chore: scaffold fusion design lab repo

Browse files

Files changed (20) hide show

.gitignore +17 -0
README.md +48 -0
baselines/README.md +8 -0
demo/README.md +8 -0
docs/FUSION_DELIVERABLES_MAP.md +80 -0
docs/FUSION_DESIGN_LAB_PLAN_V2.md +488 -0
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +161 -0
fusion_lab/__init__.py +2 -0
fusion_lab/client.py +27 -0
fusion_lab/models.py +52 -0
hackathan_raw_guidance.md +239 -0
openenv.yaml +7 -0
pyproject.toml +38 -0
server/__init__.py +2 -0
server/app.py +18 -0
server/data/README.md +4 -0
server/environment.py +20 -0
server/physics.py +21 -0
tests/test_repo_scaffold.py +9 -0
training/README.md +4 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,17 @@

+.DS_Store
+.venv/
+__pycache__/
+*.pyc
+.pytest_cache/
+.ruff_cache/
+.mypy_cache/
+.ipynb_checkpoints/
+dist/
+build/
+*.sqlite
+*.db
+reports/
+artifacts/
+checkpoints/
+server/data/generated/

README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+# Fusion Design Lab
+Fusion Design Lab is an environment-first OpenEnv hackathon project for budget-constrained stellarator design.
+The repo is organized around one clear submission thesis:
+- a narrow, reproducible stellarator design task
+- a small discrete action space
+- real simulator feedback
+- explicit constraints
+- a reward function that is iteratively improved through observed behavior
+Training is supporting evidence. The environment is the product.
+## Current Status
+This repository is the clean hackathon workspace. The detailed planning docs live in [docs/FUSION_DESIGN_LAB_PLAN_V2.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md), [docs/FUSION_DELIVERABLES_MAP.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md), and [docs/FUSION_NEXT_12_HOURS_CHECKLIST.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md).
+Implementation status:
+- repo scaffolded
+- shared models defined
+- server and client entry points stubbed
+- environment contract ready to be implemented next
+## Planned Repository Layout
+```text
+fusion-design-lab/
+├── baselines/
+├── demo/
+├── docs/
+├── fusion_lab/
+├── server/
+├── tests/
+├── training/
+├── openenv.yaml
+├── pyproject.toml
+└── README.md
+```
+## Immediate Next Steps
+1. Implement the environment contract in `server/environment.py`.
+2. Implement the VMEC-backed physics loop in `server/physics.py`.
+3. Add one stable local episode test.
+4. Run manual-playtest episodes before heavy training work.

baselines/README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+Random and heuristic baselines will live here.
+The first baseline milestone is:
+- one random agent
+- one simple heuristic agent
+- one short comparison run on the frozen task

demo/README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+Demo assets belong here.
+Expected contents:
+- one stable episode capture
+- short demo script
+- any exported figures used in the 1-minute video

docs/FUSION_DELIVERABLES_MAP.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# Fusion Design Lab Deliverables Map
+This is the output-first map for the hackathon. It is aligned to Plan V2: environment-first, reward-iteration-driven, and conservative about training claims. Everything branches from the four final artifacts the judges and submission flow will actually see.
+## Deliverables Tree
+```mermaid
+flowchart TD
+    A["Fusion Design Lab Submission"] --> B["HF Space Environment"]
+    A --> C["Colab Eval / Training Notebook"]
+    A --> D["1-Minute Demo"]
+    A --> E["Public Repo + README"]
+    B --> B0["Environment contract frozen"]
+    B --> B1["Remote reset/step works"]
+    B --> B2["Reward V0 -> V1 documented"]
+    B --> B3["One stable task runs end-to-end"]
+    B --> B4["Clear rules + reproducible episodes"]
+    C --> C1["Connects to HF Space"]
+    C --> C2["Runs multi-turn episodes"]
+    C --> C3["Logs behavior + reward traces"]
+    D --> D1["Clear problem statement"]
+    D --> D2["Manual playtest + agent trajectory"]
+    D --> D3["Reward shaping story"]
+    E --> E1["Readable project summary"]
+    E --> E2["Setup + run instructions"]
+    E --> E3["Submission links and artifacts"]
+    B0 --> F["Observation + action schema frozen"]
+    B3 --> G["Standalone physics loop proven"]
+    B2 --> H["Exploit observed -> penalty added"]
+    B4 --> I0["Deterministic action schema"]
+    D2 --> I["Human can act coherently in env"]
+    C3 --> J["Random baseline"]
+    C3 --> K["Heuristic baseline"]
+```
+## Reverse Timeline
+```mermaid
+flowchart LR
+    S["Submit by Sun 1:00 PM"] --> V["Video finalized"]
+    S --> R["Repo public and readable"]
+    S --> T["Training / eval evidence exported"]
+    S --> H["HF Space live"]
+    V --> V1["Recorded clean demo trajectory"]
+    V --> V2["Scripted 60-second story"]
+    T --> T1["Behavior trace image"]
+    T --> T2["Baseline comparison numbers"]
+    T --> T3["Colab notebook runs end-to-end"]
+    H --> H1["OpenEnv environment packaged"]
+    H --> H2["Remote client can reset and step"]
+    H --> H3["Verifier and reward stable"]
+    H --> H4["Rules are clear and reproducible"]
+    H4 --> P["Environment contract locked first"]
+    P --> Q["Manual playtest completed first"]
+    H3 --> M["Local physics loop proven first"]
+    T2 --> B["Random + heuristic baselines done"]
+    T3 --> X["Training included only if persuasive"]
+    V1 --> Y["One stable task only"]
+    V2 --> Z["Explain reward fix, not just reward gain"]
+```
+## Priority Order
+1. Prove the local physics loop.
+2. Freeze the environment contract and mark the initial reward as `V0`.
+3. Manual-playtest the environment and fix obvious reward/pathology issues.
+4. Make one stable OpenEnv task work remotely with clear, reproducible rules.
+5. Get random and heuristic baselines.
+6. Use the notebook to show traces and comparisons; include training only if it adds signal.
+7. Record the demo around environment clarity, reward shaping, and one stable trajectory.
+8. Polish the repo only after the artifacts are real.

docs/FUSION_DESIGN_LAB_PLAN_V2.md ADDED Viewed

	@@ -0,0 +1,488 @@

+# Fusion Design Lab — Plan V2
+**Hackathon:** OpenEnv Hackathon, March 7-8, 2026
+**Track:** Statement 3.1 (World Modeling — Professional Tasks)
+**Status:** Judge-aligned rewrite of the main plan
+## 1. Submission Thesis
+We are not primarily submitting "a trained model for fusion."
+We are submitting a clear, reproducible training environment for a constrained scientific design task:
+- a junior plasma-scientist-style agent
+- a small VMEC budget
+- a narrow action space
+- real simulator feedback
+- explicit constraints
+- a reward function that is understandable and iteratively improved
+Training is supporting evidence. The environment is the product.
+## 2. What Changed From V1
+This version changes the center of gravity:
+- `environment quality > training effort`
+- `reward shaping story > polished final reward formula`
+- `manual playtesting > training-first iteration`
+- `clarity and reproducibility > broad unsupported transfer claims`
+This version also separates:
+- what is already decided
+- what is a working hypothesis
+- what must be validated before it becomes part of the final pitch
+## 3. Judge-Aligned Priorities
+The judging signal now implies four priorities:
+1. The environment itself must be strong.
+2. The reward function must be explainable and visibly iterated.
+3. A human should be able to act in the environment coherently before we invest heavily in training.
+4. The final story should emphasize a clear, reproducible environment, not just a reward curve.
+## 4. Final Artifacts
+The four visible artifacts remain:
+1. HF Space environment
+2. Colab notebook for evaluation or training
+3. 1-minute demo video
+4. Public repo and README
+But the evidence order is:
+1. environment contract
+2. manual playtest log
+3. reward iteration note
+4. stable local and remote episodes
+5. random and heuristic baselines
+6. training or eval notebook evidence
+7. demo and repo polish
+## 5. Non-Negotiables
+- One stable task only.
+- No broad cross-science claims unless evidence exists.
+- No training-first drift.
+- No dependence on reward curves alone.
+- No repo/video polish before environment and baselines are real.
+## 6. Single Stable Task
+We intentionally narrow the scope to one environment family:
+- fixed-boundary, low-resolution, 2-period quasi-helical stellarator
+- one baseline input
+- small seed perturbation for episode variety
+- budget of 6 VMEC runs per episode
+The task is:
+> improve quasi-symmetry under explicit constraints with limited simulation budget
+### Constraints
+- aspect ratio in `[4.5, 7.0]`
+- edge iota in `[0.3, 0.6]`
+- volume `> 0.5 m^3`
+### Objective
+- minimize quasi-symmetry residual
+## 7. Environment Contract
+The environment contract must be frozen before meaningful evaluation.
+### Observation
+The observation should expose:
+- current quasi-symmetry residual
+- best residual so far
+- improvement from initial
+- aspect ratio
+- axis and edge iota
+- volume
+- magnetic well
+- VMEC convergence status
+- step number
+- budget remaining
+- target description
+- concise textual summary of the last action outcome
+The observation must be interpretable by a human without additional hidden state.
+### Action Space
+The action space stays intentionally small and discrete:
+- `run`
+- `submit`
+- `restore_best`
+For `run`, the controllable fields are:
+- operator: one of a small fixed set of coefficients
+- direction: increase or decrease
+- magnitude: small, medium, large
+- restart mode: hot or cold
+This is not trying to expose the full plasma design space. The goal is a legible environment, not maximal realism.
+### Episode Flow
+1. Reset from baseline plus optional small seed perturbation.
+2. Agent chooses one action.
+3. Simulator or verifier runs.
+4. Environment returns diagnostics and reward.
+5. Episode ends on:
+   - `submit`
+   - exhausted budget
+### Terminal Contract
+The episode should end cleanly and deterministically.
+At termination, the environment should provide:
+- final best design metrics
+- whether constraints were satisfied
+- total reward
+- short human-readable summary of the trajectory
+## 8. Reward V0
+The reward in this document is not the final reward. It is `Reward V0`.
+The initial scoring idea remains:
+- improvement in quasi-symmetry should help
+- constraint violations should hurt
+- VMEC non-convergence should hurt
+- wasting budget should have some cost
+- successful early submission may deserve a small bonus
+### Reward V0 Design Goals
+- easy to explain
+- sensitive to genuine progress
+- hostile to obvious degenerate behavior
+- simple enough to debug from trajectories
+### Reward V0 Failure Modes To Test
+We should expect at least some of these:
+- the agent spams large perturbations
+- the agent oscillates between equivalent moves
+- the agent overuses `restore_best`
+- the agent never submits
+- the agent submits too early
+- the agent learns to preserve safety but not improve objective
+The reward is only acceptable after we test for those behaviors.
+## 9. What Is Hypothesis vs Validated
+These are still hypotheses until manually or empirically checked:
+- `large` perturbations are risky enough to make restart choice meaningful
+- six runs are enough to create non-trivial decision pressure
+- the chosen coefficients create a task that is neither trivial nor impossible
+- `restore_best` is useful without becoming an exploit
+- heuristic should beat random on mean episode reward
+These should not be narrated as facts in the final demo until validated.
+## 10. Manual Playtest Plan
+Before heavy training, we should act as the agent ourselves.
+### Protocol
+Run 5 to 10 episodes manually and log for each step:
+- observation seen
+- action chosen
+- reason for the action
+- simulator outcome
+- reward returned
+- whether the reward matched intuitive quality
+### Questions The Playtest Must Answer
+- can a human understand what to do from the observation?
+- do action labels map to meaningful decisions?
+- is six-run budgeting interesting or arbitrary?
+- which actions are high leverage?
+- do obvious bad actions get punished?
+- do obviously good actions get rewarded?
+- does `restore_best` help recovery or encourage stalling?
+### Expected Output
+- short manual playtest log
+- one paragraph on what a good episode looks like
+- one paragraph on what broke or felt ambiguous
+## 11. Reward Iteration Story
+The reward iteration story is not a side note. It is likely part of the pitch.
+We should aim to document at least one concrete sequence:
+1. initial reward version
+2. observed bad behavior
+3. reward or penalty change
+4. changed behavior afterward
+Examples of acceptable story structure:
+- "The agent kept making risky large moves, so we increased the non-convergence penalty."
+- "The agent kept deferring commitment, so we adjusted terminal incentives."
+- "The agent overused restore-best, so we changed the reward/step logic to make stalling unprofitable."
+This is stronger than saying only "reward improved after training."
+## 12. Evidence Plan
+### HF Space
+Must prove:
+- remote `reset` works
+- remote `step` works
+- one stable episode runs end-to-end
+- the remote behavior matches the local contract
+### Colab Notebook
+Primary job:
+- connect to the live environment
+- run multi-turn episodes
+- export traces and baseline comparisons
+Secondary job:
+- show training or policy improvement if the signal is credible
+If training is weak but the environment and eval traces are strong, the notebook still ships.
+### Demo Video
+The video should show:
+1. the task
+2. the environment observation and action space
+3. one manual or agent trajectory
+4. one reward pathology and fix
+5. one baseline comparison
+Reward curves are optional supporting visuals, not the center of the story.
+### Public Repo
+The repo should make the environment easy to understand:
+- what the task is
+- what the agent sees
+- what the agent can do
+- how reward works
+- how to run one episode
+- where the demo evidence lives
+## 13. Success Gates
+### Gate 1: Environment Contract Locked
+- task frozen
+- observation schema frozen
+- action schema frozen
+- terminal conditions frozen
+### Gate 2: Manual Playtest Pass
+- human can act coherently
+- at least one trajectory feels sensible
+- at least one pathology identified or ruled out
+### Gate 3: Stable Local Episode
+- local modify -> solve -> observe loop works
+- at least one end-to-end episode is stable
+### Gate 4: Reward V1
+- at least one reward revision completed
+- story is documented with before/after behavior
+### Gate 5: Baselines
+- random baseline complete
+- heuristic baseline complete
+- heuristic is at least competitive and preferably better than random
+### Gate 6: Remote Environment
+- HF Space live
+- remote client runs one clean episode
+### Gate 7: Notebook Evidence
+- notebook runs end-to-end
+- traces exported
+- training evidence included only if it adds signal
+## 14. Timeline
+### Phase 0
+Lock the environment contract and validate the minimal toolchain needed to play the game.
+Deliverables:
+- frozen task definition
+- frozen action and observation schema
+- proof that one VMEC modify -> run -> diagnose loop works
+### Phase 1
+Manual-playtest the environment.
+Deliverables:
+- 5 to 10 episode logs
+- notes on leverage, ambiguity, and pathologies
+### Phase 2
+Implement or refine Reward V0 into Reward V1 based on real behavior.
+Deliverables:
+- documented exploit
+- documented fix
+- updated reward logic
+### Phase 3
+Stabilize one local task and run baselines.
+Deliverables:
+- stable local trajectory
+- random baseline
+- heuristic baseline
+### Phase 4
+Deploy HF Space and validate remote parity.
+Deliverables:
+- live environment
+- one stable remote episode
+### Phase 5
+Produce notebook evidence.
+Deliverables:
+- Colab notebook
+- traces
+- baseline comparison
+- training outputs only if persuasive
+### Phase 6
+Record the demo and make the repo readable.
+Deliverables:
+- 1-minute video
+- public README
+- linked artifacts
+## 15. Fallback Rules
+If something goes wrong, the fallback should preserve the environment story.
+### If training signal is weak
+Do not force a training-centric pitch.
+Ship:
+- strong environment
+- manual playtest evidence
+- reward iteration story
+- baseline traces
+- one stable remote demo
+### If reward is unstable
+Reduce ambition:
+- keep only the terms we can explain
+- remove fragile shaping
+- prefer legible trajectories over complex reward composition
+### If the task is too hard
+Do not broaden scope.
+Instead:
+- simplify the starting configuration
+- tighten the action set
+- make the task more learnable within six runs
+### If the task is too easy
+Do not add more domains.
+Instead:
+- adjust budget
+- adjust magnitudes
+- adjust reward to discourage trivial submission
+## 16. Demo Story
+The recommended demo structure is:
+### Part 1: Problem
+"The agent gets a small VMEC budget to improve a stellarator design while staying within constraints."
+### Part 2: Environment
+"Here is what the agent sees, what it can change, and what counts as success."
+### Part 3: Reward Iteration
+"Our first reward version produced a bad behavior. We changed the penalty or incentive, and the behavior improved."
+### Part 4: Evidence
+"Here is one stable trajectory, plus random and heuristic baselines."
+### Part 5: Why It Matters
+"This is a clear, reproducible simulation environment for budget-constrained scientific decision-making."
+That last line is intentionally conservative. It is strong enough without claiming universal scientific transfer.
+## 17. Immediate Next Actions
+1. Freeze the environment contract in code and docs.
+2. Run manual playtests before heavy training work.
+3. Mark the current reward as `V0`.
+4. Log the first real pathology and reward revision.
+5. Do not let notebook or video work outrun the environment evidence.

docs/FUSION_NEXT_12_HOURS_CHECKLIST.md ADDED Viewed

	@@ -0,0 +1,161 @@

+# Fusion Design Lab: Next 12 Hours Checklist
+This checklist turns the updated deliverables map and Plan V2 into concrete execution order. The goal is to produce real evidence for the four submission artifacts, with environment clarity and reproducibility driving the sequence.
+## Core Rule
+Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
+## Plan V2 Inheritance
+Carry these rules through the whole checklist:
+- Freeze the environment contract before heavy iteration.
+- Treat the current reward as `Reward V0`, not final reward.
+- Distinguish validated facts from working hypotheses.
+- Prefer behavior traces and baseline comparisons over generic reward-curve storytelling.
+- If training is weak, ship the environment story anyway.
+## Hour 0-2: Lock the Environment Contract
+1. Write the exact environment spec.
+2. Freeze one task only.
+3. Define:
+   - observation schema
+   - action schema
+   - episode loop
+   - terminal conditions
+   - reward V0 terms
+   - initial penalties
+4. Update the main diagram so it emphasizes:
+   - environment
+   - verifier
+   - reward shaping
+   - manual playtesting
+5. Mark open assumptions explicitly:
+   - risky action magnitudes
+   - whether 6 runs is enough
+   - whether `restore_best` is useful without becoming an exploit
+Exit condition: a human can read the spec and understand how to act in the environment.
+Artifacts:
+- short environment spec
+- revised mermaid diagram
+- short hypothesis list
+## Hour 2-4: Manual Playtest and Fix Reward Pathologies
+1. Manually play 5 to 10 episodes.
+2. Log for each step:
+   - observation
+   - chosen action
+   - expected effect
+   - returned reward
+   - confusion or exploit if observed
+3. Identify at least one bad incentive or exploit.
+4. Patch reward or penalty logic immediately.
+5. Write the reward shaping story:
+   - initial reward V0
+   - bad behavior
+   - refinement to reward V1
+   - improved behavior
+Exit condition: you can explain why the environment now rewards the intended behavior.
+Artifacts:
+- manual playtest log
+- reward shaping note
+- reward V1 delta note
+## Hour 4-6: Stabilize the Local Task
+1. Prove the local physics or verifier loop.
+2. Run one stable end-to-end task repeatedly.
+3. Confirm the action schema is deterministic enough for reproducible episodes.
+4. Save one clean local trajectory.
+5. Do not proceed to remote deployment until this gate is real.
+Exit condition: the same setup yields the same type of behavior reliably enough for a demo.
+Artifacts:
+- stable local run
+- saved trajectory
+## Hour 6-8: Make the HF Space Real
+1. Package the OpenEnv environment for remote use.
+2. Verify remote `reset` and `step`.
+3. Run one clean remote episode end-to-end.
+4. Confirm the remote environment preserves the same task contract as local.
+Exit condition: the environment is runnable in the actual submission surface, not only locally.
+Artifacts:
+- live HF Space environment
+- remote episode proof
+## Hour 8-10: Add Baselines
+1. Implement the random baseline.
+2. Implement the heuristic baseline.
+3. Run short comparisons on the same stable task.
+4. Save:
+   - comparison numbers
+   - behavior traces
+   - one example where heuristic beats random
+Exit condition: there is a credible baseline anchor for the judges.
+Artifacts:
+- random baseline
+- heuristic baseline
+- comparison table or figure
+## Hour 10-12: Produce the Submission Evidence
+1. Wire the Colab training or eval script to the live environment.
+2. Ensure it produces:
+   - multi-turn episodes
+   - behavior traces
+   - reward or behavior comparison outputs
+3. Draft the 60-second demo script.
+4. Record the demo around:
+   - what the environment is
+   - how reward was refined
+   - what manual playtesting revealed
+   - one stable trajectory
+   - baseline comparison
+5. If training evidence is weak, keep the notebook eval-first and do not force a training-centric claim.
+6. Make the repo public-facing and readable only after the artifacts are real.
+Exit condition: all four visible artifacts exist in usable form.
+Artifacts:
+- Colab training or eval script
+- demo script
+- draft or final video
+- updated repo README
+- explicit fallback note if training is not persuasive
+## Artifact Order
+1. Environment spec
+2. Manual playtest log
+3. Reward revision note
+4. Stable task run
+5. Random baseline
+6. Heuristic baseline
+7. Colab training or eval evidence
+8. Demo recording
+9. Repo polish
+## Non-Negotiables
+- Do not widen scope beyond one stable task.
+- Do not optimize training before manual playtesting.
+- Do not rely on reward curves alone; keep trajectory evidence.
+- Do not narrate hypotheses as facts before they are checked.
+- Do not polish the repo or video before the environment and baselines are real.
+- Treat judge comments as pressure toward clarity and reproducibility, not broader unsupported claims.
+- Do not force a training-centric story if the strongest evidence is environment quality plus baselines.

fusion_lab/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ """Shared client-side package for Fusion Design Lab."""
2	+

fusion_lab/client.py ADDED Viewed

	@@ -0,0 +1,27 @@

+from __future__ import annotations
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+from fusion_lab.models import StellaratorAction, StellaratorObservation, StellaratorState
+class FusionLabClient(
+    EnvClient[StellaratorAction, StellaratorObservation, StellaratorState]
+):
+    """Thin typed client wrapper for the remote OpenEnv environment."""
+    def _step_payload(self, action: StellaratorAction) -> dict[str, object]:
+        return action.model_dump(exclude_none=True)
+    def _parse_result(self, payload: dict[str, object]) -> StepResult[StellaratorObservation]:
+        observation = StellaratorObservation(**payload)
+        return StepResult(
+            observation=observation,
+            reward=observation.reward,
+            done=observation.done,
+        )
+    def _parse_state(self, payload: dict[str, object]) -> StellaratorState:
+        return StellaratorState(**payload)

fusion_lab/models.py ADDED Viewed

	@@ -0,0 +1,52 @@

+from __future__ import annotations
+from typing import Literal
+from pydantic import BaseModel, Field
+ActionIntent = Literal["run", "submit", "restore_best"]
+OperatorName = Literal["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"]
+DirectionName = Literal["increase", "decrease"]
+MagnitudeName = Literal["small", "medium", "large"]
+RestartMode = Literal["hot", "cold"]
+class StellaratorAction(BaseModel):
+    intent: ActionIntent
+    operator: OperatorName | None = None
+    direction: DirectionName | None = None
+    magnitude: MagnitudeName | None = None
+    restart: RestartMode | None = None
+    reasoning: str = ""
+class StellaratorObservation(BaseModel):
+    diagnostics_text: str
+    quasi_symmetry_residual: float
+    aspect_ratio: float
+    rotational_transform_axis: float
+    rotational_transform_edge: float
+    magnetic_well_depth: float
+    volume: float
+    vmec_converged: bool
+    step_number: int
+    budget_remaining: int
+    best_qs_residual: float
+    constraints_satisfied: bool
+    target_spec: str
+    reward: float | None = None
+    done: bool = False
+class StellaratorState(BaseModel):
+    step_count: int = 0
+    initial_qs: float = 0.0
+    current_qs: float = 0.0
+    prev_qs: float = 0.0
+    best_qs: float = Field(default=float("inf"))
+    budget_total: int = 6
+    budget_remaining: int = 6
+    constraints_satisfied: bool = True
+    history: list[str] = Field(default_factory=list)

hackathan_raw_guidance.md ADDED Viewed

	@@ -0,0 +1,239 @@

+## **OpenEnv Hackathon Participant Guide**
+Welcome to the [OpenEnv Hackathon](https://cerebralvalley.ai/e/open-env-hackathon), hacker! 👋 We’re thrilled to have you on board.
+This guide is your all-in-one resource for the event, including schedule, rules, technical resources, problem statements, judging information, and more. Please read this carefully; most answers can be found here.
+## **1. Join the [PyTorch Discord Server](https://discord.gg/VBcf6VtfY6)**
+- You’ll be given a Hackathon Participant role by an admin, which will give you access to the hackathon-specific channels.
+- Here, you’ll be able to interact with hackers and sponsors, introduce yourselves, and form teams (for a maximum team size of **3**).
+- If you don't receive your role within **24 hours of joining,** please ping @CV.
+- Please submit your Discord username below so we can grant you the role
+[linkEmbed]
+## **2. Location**
+**|** Shack15 (1 Ferry Building, Suite 201, San Francisco CA. 94111)
+- **Venue Access:** Shack15 is on the 2nd floor of the Ferry Building. Go up the Ferry Building elevator to the second floor, and turn left. Here you will see the main entrance to Shack15.
+- **Parking:** Parking near the Ferry Building is extremely limited. Consider parking farther out and taking Uber, Lyft, or Public Transportation.
+[youtube]
+## **3. WiFi Information**
+- **Username:** SHACK15_Members
+- **Password:** M3mb3r$4L!f3
+## **4. Hackathon Schedule**
+**Saturday, March 7 (Outline)**
+- **9:00 AM:** Doors Open •󠁏 Breakfast Served •󠁏 Team Formation
+- **10:00 AM – 11:30AM**: Kick-off presentations with Meta, Hugging Face, UC Berkeley, CoreWeave, OpenPipe, Unsloth AI, Fleet AI, Mercor, Scaler AI Labs, Snorkel AI, Patronus AI, Halluminate and Scale AI
+- **11:30 AM:** Hacking Begins
+- **1:00 PM:** Lunch Served
+- **6:00 PM:** Dinner Served
+- **10:00 PM:** Doors Close •󠁏 Re-entry not permitted
+**Sunday, March 8 (Outline)**
+- **9:00AM:** Doors Open •󠁏 Breakfast Served
+- **1:00PM:** Hacking stops •󠁏 Submissions Due
+- **1:15PM:** First Round Judging Begins
+- **2:00PM:** Lunch Served
+- **3:00PM:** Final Round Judging Begins
+- **4:00PM:** Winners Announced and Closing
+- **5:00PM:** Doors Close
+All presentation slides can be found here
+[linkEmbed]
+## **5. Hackathon and Submission Rules**
+To keep things fair and aligned with our goals, all teams must follow these rules:
+- **Open Source:** Please ensure your repository is public.
+- **New Work Only:** All projects must be started from scratch during the hackathon with no previous work.
+- **Team Size:** Teams may have up to **3** members.
+- **Banned Projects:** Projects will be disqualified if they: violate legal, ethical, or platform policies, use code, data, or assets you do not have the rights to.
+- Your project **must** use OpenEnv (stable release 0.2.1) deployed on HF spaces
+- You must show a minimal training script for your environment using Unsloth or HF TRL in Colab.
+- You must upload a **one minute** demo video to YouTube talking about your submission.
+## **6. Hackathon Problem Statements**
+Your project must address at least **one of the five** required problem statements.
+- Some problem statements include **optional partner-sponsored sub-problem statements**, which are additional focus areas related to the main theme.
+- Your project may align with **multiple partner sub-problem statements**, but you can only be **judged for a maximum of two**. Please **select up to two** when submitting.
+- Projects that match these partner sub-problem statements are eligible for **extra partner prizes**, judged separately from the main track winners.
+- Each partner sub-problem statement carries a prize of **$10,000 USD**.
+**Statement 1: Multi-Agent Interactions**
+Environments for this theme involve cooperation, competition, negotiation, and coalition formation. Learning from these environments will enable agents to model the beliefs and incentives of others in partially observable settings. This drives theory-of-mind reasoning and emergent strategic behavior.
+- **Expected Outcome:** an environment that can be used to train multi-agent task handling in a LLM
+- **Example Environments:** Market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
+- **Partner Sub-Themes:**
+  - **Fleet AI:** Scalable Oversight: Environments that train oversight agents to monitor, analyze, and explain the behavior of other AI agents operating in complex, multi-agent settings.
+  - **Halluminate:** Multi-Actor Environments: Build a realistic environment where an agent interacts with and manages multiple actors (agents) to discover and achieve the task
+**Statement 2: (Super) Long-Horizon Planning & Instruction Following**
+You will build environments that require deep, multi-step reasoning with sparse or delayed rewards. After using these environments, the goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes. The aim is to push beyond shallow next-token reasoning toward structured planning and durable internal representations.
+- **Expected Outcome:** an environment that can capture and improve LLM behaviour on challenging long horizon tasks that need long running sessions beyond context memory limits.
+- **Example Environments:** Research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
+- **Partner Sub-Themes:**
+  - **Mercor:** Make an environment with capped/uncapped rewards where frontier model rewards scale with token output.
+  - **Scale AI:** Environments for long horizon workflows for non-code use cases within a business setting: focusing on either Sales, Project management, or HR & IT.
+**Statement 3: World Modeling**
+- **Statement 3.1: Professional Tasks:** Here you will develop environments that require real interaction with tools, APIs, or dynamic systems where the model is expected to do real hard work instead of exploiting short-cuts to arrive at the desired outcome. Learning from these environments will enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows. The goal is to strengthen causal reasoning and persistent world models.
+  - **Expected Outcome:** an environment capturing nuances of a defined partially observable world and improve LLM interaction with it
+  - **Example Environments:** Dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers → code → experiments), economic simulations with feedback, tool-discovery benchmarks.
+  - **Partner Sub-Theme:**
+    - **Scaler AI Labs:** Multi-App RL Environment for Enterprise Workflows: Create RL environments to demonstrate complex workflows, business rule nuances etc in a large enterprise
+- **Statement 3.2: Personalized Tasks:** Here we will develop an environment that offers real personalized task handling, imagine replying to personal messages or handling dinner conflicts due to work conflicts, replying to tough emails. Think any personal assistant tasks.
+  - **Expected Outcome:** An environment that gives the model a realistic simulation of handling personal tasks, conflicts and managing them as delegations
+  - **Example Environments:** Executive Assistant Meeting Planner, Dinner and drive planning, email and message replying, etc
+  - **Partner Sub-Theme:**
+    - **Patronus AI:** Consumer Workflows with Schema Drift: Multi-step consumer workflow environments where the underlying data schemas, API contracts, and t&cs/policies/rules change.
+**Statement 4: Self-Improvement**
+The focus here is to create environments where agents can learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula. Rather than optimizing fixed tasks, the goal is for agents to learn to drive their own capability growth. The objective is recursive skill amplification.
+- **Expected Outcome:** an environment for improving self-play of a LLM over a defined set of tasks
+- **Example Environments:** Self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
+- **Partner Sub-Theme:**
+  - **Snorkel AI:** Simulated Experts-in-the-Loop: Environment that simulates interactions with real subject-matter experts, with changing requirements / preferences.
+**Statement 5: Wild Card - Impress Us!**
+We do not want to limit your focus if your idea doesn’t fit the boxes above, we want and WILL reward out of box tasks, please be creative but remember to add submissions that meaningfully add value to LLM training on a certain task.
+More details about each theme can be found here:
+[linkEmbed]
+## **7. CV Hackathon Winners**
+[linkEmbed]
+## **8. OpenEnv Provided Resources**
+**Please read through the entire slideshow here. This includes:**
+- OpenEnv Fundamentals, Architecture
+- Local Dev, Docker, and HF Spaces Deployment
+- OpenEnv in Practice
+- Training (TRL & Unsloth)
+- How-to-Access-Infrastructure (including GPU Request Form)
+[linkEmbed]
+## **9. Partner Provided Resources**
+- **Unsloth AI Resources**
+  - <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl-notebooks>
+- **Mercor Resources**
+  - Dataset: <https://huggingface.co/datasets/mercor/apex-agents>
+  - Archipelago repo to run the eval: <https://github.com/Mercor-Intelligence/archipelago>
+  - APEX-Agents paper: <https://arxiv.org/abs/2601.14242>
+- **Hugging Face Resources**
+  - **$30** in Compute and Inference Credits
+  - To claim your credits, set up a HF account here: <https://huggingface.co/join>
+  - Then, follow this link: <https://huggingface.co/openenv-community>
+  - You will be granted **$30** of compute and inference credits!
+- **Northflank Resources**
+  - Each team gets an H100
+  - Northflank instructions
+    [linkEmbed]
+  - Join the NorthFlank discord channel for any questions
+  - Please fill out this form:
+    [linkEmbed]
+- **Cursor Resources**
+  - **$50** in Cursor Credits, **apply below**
+    [linkEmbed]
+## **10. Judging & Submissions**
+Judges will be taking place on **Sunday, March 8**. These judges are evaluating your **technical demos** in the following categories. _Show us what you have built_ to solve our problem statements. Please **do not** show us a presentation. We'll be checking to ensure your project was built **entirely during the event**; no previous work is allowed.
+**|** **Teams should submit [here](https://cerebralvalley.ai/e/openenv-hackathon-sf/hackathon/submit) when they have completed hacking.** In the submission form, you will have to upload a **one minute** demo video on YouTube talking about your submission. You must also show a minimal training script for your environment using Unsloth or HF TRL in Colab.
+**Please ensure your project uses** use OpenEnv (stable release 0.2.1) deployed on HF spaces.
+[linkEmbed]
+**Judging Criteria**
+- **Environment Innovation (40%) -** Is the environment novel, creative, or challenging? Does it meaningfully test the agent’s behavior?
+- **Storytelling (30%) -** Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
+- **Training Script Showing Improvement in Rewards (20%) -** Does the demo provide observable evidence of training progress (reward curves, metrics, or before/after behavior)?
+- **Reward and Training Pipeline Setup (10%) -** Is the reward logic coherent, and does the pipeline produce meaningful improvement in the agent’s inference (how it acts in the environment)?
+**Judging Process**
+**|** Judging proceeds in two rounds:
+- Hackers will be assigned groups of judges; \~3 minutes to pitch followed by 1-2 minutes of Q/A
+- The top **six** teams in ranking will get to demo on stage to a panel of judges; \~3 minutes to pitch followed by 2-3 minutes for Q/A.
+## **11. Prizes**
+- **1st Place:** $15,000 USD Cash
+- **2nd Place:** $9,000 USD Cash
+- **3rd Place:** $6,000 USD Cash

openenv.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+spec_version: 1
+name: fusion_design_lab
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,38 @@

+[project]
+name = "fusion-design-lab"
+version = "0.1.0"
+description = "OpenEnv environment for budget-constrained stellarator design"
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+  "fastapi>=0.115.0",
+  "numpy>=2.0.0",
+  "openenv-core[core]>=0.2.1",
+  "pydantic>=2.10.0",
+  "uvicorn>=0.34.0",
+]
+[project.optional-dependencies]
+physics = [
+  "simsopt",
+  "vmecpp",
+]
+dev = [
+  "pytest>=8.3.0",
+  "ruff>=0.11.0",
+]
+[build-system]
+requires = ["setuptools>=69.0"]
+build-backend = "setuptools.build_meta"
+[tool.setuptools]
+packages = ["fusion_lab", "server"]
+[tool.ruff]
+line-length = 100
+target-version = "py311"
+[tool.pytest.ini_options]
+testpaths = ["tests"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ """Server-side package for Fusion Design Lab."""
2	+

server/app.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from __future__ import annotations
+from fastapi import FastAPI
+from server.environment import TASK, environment_status
+app = FastAPI(title="Fusion Design Lab")
+@app.get("/healthz")
+def healthcheck() -> dict[str, str]:
+    return {"status": "ok", "environment": environment_status()}
+@app.get("/task")
+def task_summary() -> dict[str, object]:
+    return TASK

server/data/README.md ADDED Viewed

	@@ -0,0 +1,4 @@


1	+ Baseline VMEC inputs and related static assets belong here.
2	+
3	+ Do not commit generated solver outputs or large transient artifacts.
4	+

server/environment.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from __future__ import annotations
+from typing import Final
+TASK: Final[dict[str, object]] = {
+    "description": "Minimize quasi-symmetry error for a 2-period quasi-helical stellarator.",
+    "constraints": {
+        "aspect_ratio": [4.5, 7.0],
+        "rotational_transform_edge": [0.3, 0.6],
+        "volume_min": 0.5,
+    },
+    "budget": 6,
+    "baseline_input": "server/data/input.QH_baseline",
+}
+def environment_status() -> str:
+    """Return a simple status string until the full environment is implemented."""
+    return "scaffolded"

server/physics.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from __future__ import annotations
+class PhysicsEngine:
+    """Placeholder for the VMEC-backed physics loop.
+    The next implementation step should make this the single place that:
+    - loads the baseline input
+    - applies discrete coefficient updates
+    - runs the solver
+    - computes diagnostics
+    - tracks best-known designs
+    """
+    def __init__(self) -> None:
+        self._status = "unimplemented"
+    @property
+    def status(self) -> str:
+        return self._status

tests/test_repo_scaffold.py ADDED Viewed

	@@ -0,0 +1,9 @@

+from server.environment import TASK, environment_status
+def test_environment_scaffold_status() -> None:
+    assert environment_status() == "scaffolded"
+def test_task_budget_is_fixed() -> None:
+    assert TASK["budget"] == 6

training/README.md ADDED Viewed

	@@ -0,0 +1,4 @@


1	+ Training and evaluation notebooks belong here.
2	+
3	+ This repository treats notebooks as supporting evidence for the environment, not the primary product.
4	+