aishani-s20 commited on
Commit
2d7c393
·
1 Parent(s): ea65582

improvements

Browse files
README.md CHANGED
@@ -10,182 +10,196 @@ pinned: false
10
 
11
  # 🌌 Quantum Circuit Optimization Environment
12
 
13
- > **An advanced, physics-grounded Reinforcement Learning environment for the Meta OpenEnv Hackathon.** > Challenge agents to act as quantum compilers — optimizing multi-qubit circuits through mathematical identities and commutativity rules.
 
14
 
15
  ---
16
 
17
- ## 🏆 Key Features
18
- * **NP-Hard Problem Space:** Moves beyond static text puzzles into multi-dimensional spatial reasoning.
19
- * **Deterministic Reproducibility (Seed Logic):** Fully supports the OpenEnv framework's episode seed. The engine guarantees the **exact same complex circuit** is generated for a given episode across different model runs, ensuring flawless grader reproducibility.
20
- * **Relative Compression Grader:** Dynamically scores agents based on mathematical compression ratios, adapting perfectly to any circuit depth.
 
21
 
22
  ---
23
 
24
- ## 🚀 Motivation: The Quantum Compiler Challenge
 
 
25
 
26
- In the real world, quantum computers suffer from rapid **decoherence**. Every quantum gate introduces noise, so shorter circuits yield higher-fidelity results. However, optimal quantum circuit compression is an NP-Hard problem.
27
 
28
- Current LLM benchmarks often rely on static toy puzzles. This environment bridges the gap by requiring agents to apply real-world quantum physics rules—such as swapping spatially separated, commuting gates to bring distant self-inverse identities together. This precludes simple memorization; agents must dynamically reason about multi-dimensional spatial gate layouts and plan over long horizons.
29
 
30
  ---
31
 
32
- ## 🛠️ Environment Specifications
33
 
34
- ### 👁️ Observation Space
35
 
36
  The environment provides the agent with a complete topological view of the quantum state at every step.
37
 
38
  | Field | Type | Description |
39
  |---|---|---|
40
- | `circuit` | `List[Gate]` | Current sequence of gates. Each gate includes a `name` and `target_qubits`. |
41
  | `gate_count` | `int` | Current number of gates in the circuit. |
42
  | `num_qubits` | `int` | Total number of qubits in the system. |
43
- | `done` | `bool` | `True` if the circuit is fully optimized, dead-ended, or the step limit is reached. |
44
  | `reward` | `float` | Reward received from the previous action. |
45
- | `metadata` | `dict` | Instance-specific tracking data: `task`, `initial_count`, `seed`. |
 
 
 
 
 
 
 
 
 
 
46
 
47
  ---
48
 
49
- ### 🎮 Action Space
50
 
51
  The agent submits a JSON payload specifying where and how to modify the circuit.
52
 
53
  | Field | Type | Description |
54
  |---|---|---|
55
- | `target_index` | `int` | The index of the primary gate in the circuit array to target. |
56
- | `action_type` | `int` | The specific quantum physics rule to apply (1–4). See below. |
57
 
58
  #### Available Action Types
59
 
60
  | ID | Name | Description | Reward |
61
  |---|---|---|---|
62
- | `1` | **Cancel Identical Gates** | Removes self-inverse gate pairs (e.g., X·X = I) targeting the same qubits, provided they are not blocked by overlapping intermediate gates. | `+1.0` |
63
- | `2` | **Swap Commuting Gates** | Swaps the target gate with the next adjacent gate **only if** their target qubits do not intersect. This enables agents to bring distant cancellable gates together. | `-0.05` |
64
- | `3` | **Identity Collapse (H-X-H)** | Replaces a 3-gate sequence `H → X → H` on the same qubit with a single `Z` gate. | `+2.0` |
65
- | `4` | **Entanglement Compression** | Replaces an adjacent `CNOT → SWAP` sequence sharing exact qubits with a single `CZ` gate. | `+1.0` |
66
 
67
- > **Note:** Invalid actions (e.g., out-of-bounds index, illegal non-commuting swaps) incur a `-0.10` penalty to discourage hallucination, and the circuit state remains unchanged.
68
 
69
  ---
70
 
71
- ## 📊 Tasks & Difficulty Levels
72
-
73
- The environment natively supports dynamic scaling of qubits and circuit depth. By setting `QUANTUM_TASK=random`, the environment dynamically generates a fresh, randomized circuit topology from a pool of valid gate pairs and noise injections.
74
 
75
- | Task | Qubits | Initial Gates | Entanglement | Expected Difficulty |
76
  |---|---|---|---|---|
77
- | `easy` | 2 | ~20 | None (Single Qubit) | **Low:** Agents can easily spot local cancellations. |
78
- | `medium` | 4 | ~30 | Low (CNOT, SWAP) | **Moderate:** Requires basic spatial swapping to clear blocker gates. |
79
- | `hard` | 6 | ~70 | High (Deep Entanglement) | **Extreme:** Demands rigorous long-horizon spatial reasoning across many qubits. |
 
 
80
 
81
  ---
82
 
83
- ## 🏆 Grader & Evaluation
84
 
85
- Because calculating the absolute theoretical minimum length of a randomized multi-qubit circuit is NP-Hard, the environment utilizes a **Relative Compression Grader**:
86
 
87
- $$\text{Score} = \max\left(0.0, \min\left(1.0, \frac{\text{Initial Count} - \text{Final Count}}{\text{Initial Count}}\right)\right)$$
 
 
 
 
88
 
89
- - A score of **1.0** indicates the agent perfectly compressed the circuit down to 0 gates.
90
- - The **success threshold** is `0.10` — meaning a 10% reduction in overall circuit depth is considered a passing score for a given episode.
91
 
92
- ---
93
 
94
- ## 📈 Baseline Scores
95
 
96
- This environment is designed to serve as a rigorous boundary test for frontier reasoning models. All baseline evaluations are fully reproducible using the environment's deterministic seed logic.
97
 
98
- | Model | Task | Result | Notes |
99
- |---|---|---|---|
100
- | Qwen 2.5 72B Instruct (Zero-Shot) | `easy` | **Passing Baseline** | Successfully identifies and executes local cancellations (Score: ~0.15–0.30). |
101
- | Qwen 2.5 72B Instruct (Zero-Shot) | `medium` | **Borderline** | Attempts basic spatial swapping but frequently gets trapped by blocking gates. Usually falls just short of the 0.10 success threshold (Score: ~0.00–0.08). |
102
- | Qwen 2.5 72B Instruct (Zero-Shot) | `hard` | **Benchmark Limit** | Provides a highly complex layout that tests the absolute limits of current LLMs, establishing a rigorous 0.0 baseline. **(100% reproducible via episode seeds)**. |
103
 
104
- > **Conclusion:** This environment successfully establishes an unsolved benchmark for testing algorithmic spatial planning, proving that advanced scaffolding (e.g., Tree-of-Thought or ReAct loops) is required for deep quantum compilation.
105
- >
106
- > **Note on Reproducibility:** You can reliably reproduce these exact baseline constraints. The environment fully supports OpenEnv episode seeding, guaranteeing the exact same initial circuit generation for any given seed across different runs.
107
 
108
  ---
109
 
110
- ## 💻 Setup and Usage Instructions
111
 
112
  ### 1. Prerequisites
113
 
114
- Ensure you have **Docker** and **uv** installed, then install the OpenEnv core dependencies:
115
-
116
  ```bash
117
- uv pip install openenv-core
118
  uv sync
119
  ```
120
 
121
  ### 2. Environment Variables
122
 
123
- Create a ```.env``` file in the root directory:
124
 
125
- ```bash
126
  HF_TOKEN="your_huggingface_read_token"
127
- API_BASE_URL="[https://router.huggingface.co/v1](https://router.huggingface.co/v1)"
128
  MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
129
  QUANTUM_TASK="random"
 
130
  ```
131
 
132
  | Variable | Description |
133
  |---|---|
134
- | `HF_TOKEN` | Your HuggingFace API token (read access) |
135
  | `API_BASE_URL` | Inference endpoint (HF router or custom) |
136
  | `MODEL_NAME` | Model to run inference with |
137
- | `QUANTUM_TASK` | Task name: `easy`, `medium`, `hard`, or `random` |
138
-
139
 
140
  ### 3. Build & Validate
141
-
142
  ```bash
143
  docker build -t quantum_env .
144
  openenv validate .
145
  ```
146
-
147
  ### 4. Run Inference
148
-
149
  ```bash
150
  uv run python inference.py
151
  ```
152
-
153
- The inference script handles API errors gracefully and automatically parses JSON outputs into the strict Action Space schema.
154
-
155
 
156
- ### 5. Reproducing via Seed
157
 
158
- To test the deterministic generation and replicate our baseline scores, you can pass a specific seed to the environment during the reset phase in your client script.
159
 
160
- Simply modify the reset call in ```inference.py```:
161
 
162
- ```bash
163
- # Pass any integer seed to guarantee the exact same initial circuit topography
164
  result = await env.reset(seed=42)
165
  ```
166
 
 
 
167
  ---
168
 
169
- ## 📁 Project Structure
170
 
171
  ```
172
  .
173
  ├── server/
174
- │ ├── app.py # FastAPI WebSocket Server
175
- ── quantum_openenv_env_environment.py # Core Physics Engine & Randomizer
176
- ├── client.py # EnvClient Translator
177
- �� models.py # Strict Pydantic Data Models
178
- ├── inference.py # Baseline LLM Agent Script
179
- ├── openenv.yaml # OpenEnv spec manifest
180
- ├── Dockerfile # Container definition
 
 
 
 
181
  └── README.md
182
  ```
183
 
184
  ---
185
 
 
186
 
187
- ## 📄 License
188
-
189
- This project is released under the MIT license found in the `LICENSE` file.
190
-
191
- ---
 
10
 
11
  # 🌌 Quantum Circuit Optimization Environment
12
 
13
+ > **An advanced, physics-grounded Reinforcement Learning environment for the Meta OpenEnv Hackathon.**
14
+ > Challenge agents to act as quantum compilers — optimizing multi-qubit circuits through mathematical identities and commutativity rules.
15
 
16
  ---
17
 
18
+ ## Key Features
19
+
20
+ - **NP-Hard Problem Space:** Moves beyond static text puzzles into multi-dimensional spatial reasoning.
21
+ - **Deterministic Reproducibility (Seed Logic):** Fully supports the OpenEnv framework's episode seed. The engine guarantees the **exact same circuit** is generated for a given seed across different model runs, ensuring flawless grader reproducibility.
22
+ - **Three Differentiated Graders:** Each difficulty tier measures a genuinely different skill — pure compression on easy, identity-discovery bonus on medium, and step-efficiency weighting on hard.
23
 
24
  ---
25
 
26
+ ## Motivation: The Quantum Compiler Challenge
27
+
28
+ In the real world, quantum computers suffer from rapid **decoherence**. Every quantum gate introduces noise, so shorter circuits yield higher-fidelity results. However, optimal quantum circuit compression is an **NP-Hard problem**.
29
 
30
+ While traditional frameworks like **Qiskit, Cirq, and tket** rely on hardcoded human heuristics to identify redundant gates and exploit commutativity, this environment turns that exact physics problem into a rigorous testing ground for Artificial Intelligence. It is designed to evaluate whether RL and LLM agents can independently learn and execute these compiler heuristics from scratch.
31
 
32
+ Current LLM benchmarks rely on static toy puzzles. This environment bridges the gap by requiring agents to generalize real-world quantum physics rules such as swapping spatially separated, commuting gates to bring distant self-inverse identities together. **Memorization is impossible**; agents must dynamically reason about multi-dimensional spatial gate layouts and plan over long horizons.
33
 
34
  ---
35
 
36
+ ## Environment Specifications
37
 
38
+ ### Observation Space
39
 
40
  The environment provides the agent with a complete topological view of the quantum state at every step.
41
 
42
  | Field | Type | Description |
43
  |---|---|---|
44
+ | `circuit` | `List[Gate]` | Current gate sequence. Each gate has a `name` (e.g. `"H"`, `"CNOT"`) and `target_qubits`. |
45
  | `gate_count` | `int` | Current number of gates in the circuit. |
46
  | `num_qubits` | `int` | Total number of qubits in the system. |
47
+ | `done` | `bool` | `True` if the circuit is fully optimized, dead-ended, or the step limit (150) is reached. |
48
  | `reward` | `float` | Reward received from the previous action. |
49
+ | `metadata` | `dict` | Episode tracking data see breakdown below. |
50
+
51
+ #### Metadata Fields
52
+
53
+ | Key | Type | Description |
54
+ |---|---|---|
55
+ | `task` | `str` | Active task name: `"easy"`, `"medium"`, or `"hard"`. |
56
+ | `initial_count` | `int` | Gate count at episode start. Used by all graders to compute compression ratio. |
57
+ | `step` | `int` | Current step number. Used by the hard grader for step-efficiency scoring. |
58
+ | `seed` | `int \| None` | RNG seed used to generate this circuit. Pass the same value to `reset()` to reproduce it exactly. |
59
+ | `used_advanced_actions` | `bool` | `True` if the agent successfully used action 3 (H-X-H→Z) or action 4 (CNOT-SWAP→CZ) this episode. Used by the medium grader bonus. |
60
 
61
  ---
62
 
63
+ ### Action Space
64
 
65
  The agent submits a JSON payload specifying where and how to modify the circuit.
66
 
67
  | Field | Type | Description |
68
  |---|---|---|
69
+ | `target_index` | `int` | Index of the primary gate in the circuit array to target. |
70
+ | `action_type` | `int` | Quantum physics rule to apply (1–4). See below. |
71
 
72
  #### Available Action Types
73
 
74
  | ID | Name | Description | Reward |
75
  |---|---|---|---|
76
+ | `1` | **Cancel Identical Gates** | Removes self-inverse gate pairs (X·X = I, H·H = I, CNOT·CNOT = I, etc.) on the same qubits, not blocked by overlapping intermediate gates. | `+1.0` |
77
+ | `2` | **Swap Commuting Gates** | Swaps the target gate with the next adjacent gate **only if** their qubit sets do not intersect. Enables bringing distant cancellable pairs together. | `-0.05` |
78
+ | `3` | **H-X-H Identity Collapse** | Replaces a `H → X → H` sequence on the same qubit with a single `Z` gate (net: 2 gates removed). | `+2.0` |
79
+ | `4` | **Entanglement Compression** | Replaces an adjacent `CNOT → SWAP` on the same qubits with a single `CZ` gate (net: 1 gate removed). | `+1.0` |
80
 
81
+ > **Invalid actions** (out-of-bounds index, illegal non-commuting swap, pattern not present) incur a `-0.10` penalty. Circuit state remains unchanged.
82
 
83
  ---
84
 
85
+ ## Tasks & Difficulty Levels
 
 
86
 
87
+ | Task | Qubits | Initial Gates | Entanglement | Key Challenge |
88
  |---|---|---|---|---|
89
+ | `easy` | 2 | ~20 | None (single-qubit only) | Identify and cancel local self-inverse gate pairs. |
90
+ | `medium` | 4 | ~30 | Low (CNOT, SWAP) | Swap to unblock cancellations; discover H-X-H and CNOT-SWAP identities. |
91
+ | `hard` | 6 | ~70 | High (deep entanglement) | Long-horizon spatial reasoning; must compress efficiently with minimal wasted steps. |
92
+
93
+ Set `QUANTUM_TASK=random` to have the environment randomly select a difficulty tier on each `reset()`.
94
 
95
  ---
96
 
97
+ ## Grader & Evaluation
98
 
99
+ Each grader measures a **different skill** matching its difficulty tier. All scores are strictly within `(0.01, 0.99)`.
100
 
101
+ | Task | Grader Formula | Full Score Requires |
102
+ |---|---|---|
103
+ | **Easy** | `score = (initial − final) / initial` | Any consistent gate removal earns proportional credit. |
104
+ | **Medium** | `score = compression + 0.15` if agent used action 3 or 4, else `score = compression` | Gate removal **and** discovering at least one algebraic identity. |
105
+ | **Hard** | `score = 0.7 × compression + 0.3 × step_efficiency` where `step_efficiency = 1 − (steps / 150)` | High compression **and** achieving it with few wasted steps. |
106
 
107
+ The hard grader directly penalises the behaviour frontier models exhibit most thrashing through invalid swaps before finding cancellations, which exhausts the step budget without progress.
 
108
 
109
+ > **Why not use the theoretical minimum gate count?** Computing the absolute minimum for a randomized multi-qubit circuit is NP-Hard. Relative compression grading is the standard approach used in real quantum compiler benchmarks, and is the only approach that scales to arbitrary circuit depth.
110
 
111
+ ---
112
 
113
+ ## Baseline Scores
114
 
115
+ | Model | Task | Score | Result | Notes |
116
+ |---|---|---|---|---|
117
+ | Qwen 2.5 72B Instruct (Zero-Shot) | `easy` | ~0.22 | Pass | Identifies local cancellations reliably. |
118
+ | Qwen 2.5 72B Instruct (Zero-Shot) | `medium` | ~0.08 | Pass | Occasional cancellations; rarely discovers identities; no bonus awarded. |
119
+ | Qwen 2.5 72B Instruct (Zero-Shot) | `hard` | ~0.04 | Fail | Thrashes with invalid swaps; step budget exhausted before meaningful compression. |
120
 
121
+ > Success threshold: `score 0.10`. The hard task is an **unsolved benchmark** for zero-shot reasoning models. Advanced scaffolding (ReAct, Tree-of-Thought) is required for reliable performance.
 
 
122
 
123
  ---
124
 
125
+ ## Setup and Usage Instructions
126
 
127
  ### 1. Prerequisites
128
 
 
 
129
  ```bash
130
+ pip install openenv-core
131
  uv sync
132
  ```
133
 
134
  ### 2. Environment Variables
135
 
136
+ Create a `.env` file in the root directory:
137
 
138
+ ```env
139
  HF_TOKEN="your_huggingface_read_token"
140
+ API_BASE_URL="https://router.huggingface.co/v1"
141
  MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
142
  QUANTUM_TASK="random"
143
+ IMAGE_NAME="quantum_env"
144
  ```
145
 
146
  | Variable | Description |
147
  |---|---|
148
+ | `HF_TOKEN` | HuggingFace API token (read access) |
149
  | `API_BASE_URL` | Inference endpoint (HF router or custom) |
150
  | `MODEL_NAME` | Model to run inference with |
151
+ | `QUANTUM_TASK` | Task: `easy`, `medium`, `hard`, or `random` |
152
+ | `IMAGE_NAME` | Docker image name for the environment server |
153
 
154
  ### 3. Build & Validate
155
+
156
  ```bash
157
  docker build -t quantum_env .
158
  openenv validate .
159
  ```
160
+
161
  ### 4. Run Inference
162
+
163
  ```bash
164
  uv run python inference.py
165
  ```
 
 
 
166
 
167
+ The script runs **easy → medium → hard** sequentially, each in its own container instance, and prints a results summary table at the end. All 3 tasks are always evaluated.
168
 
169
+ ### 5. Reproducing Baseline via Seed
170
 
171
+ To reproduce the exact same circuit for a given episode, pass a seed to `reset()`:
172
 
173
+ ```python
174
+ # Same seed always produces the same initial circuit
175
  result = await env.reset(seed=42)
176
  ```
177
 
178
+ The environment uses `random.Random(seed)` internally — fully isolated per instance, safe for concurrent WebSocket sessions.
179
+
180
  ---
181
 
182
+ ## Project Structure
183
 
184
  ```
185
  .
186
  ├── server/
187
+ │ ├── __init__.py
188
+ ── app.py # FastAPI server entry point
189
+ ├── graders.py # Task-specific grader functions
190
+ │ └ quantum_openenv_env_environment.py # Core environment + physics engine
191
+ ├── __init__.py
192
+ ├── client.py # OpenEnv WebSocket client
193
+ ├── models.py # Typed Pydantic models
194
+ ├── inference.py # Baseline LLM inference script (all 3 tasks)
195
+ ├── openenv.yaml # OpenEnv spec manifest
196
+ ├── Dockerfile # Container definition
197
+ ├── pyproject.toml
198
  └── README.md
199
  ```
200
 
201
  ---
202
 
203
+ ## License
204
 
205
+ This project is released under the MIT license found in the `LICENSE` file.
 
 
 
 
inference.py CHANGED
@@ -1,22 +1,24 @@
1
  """
2
  Inference Script
3
  ================
4
- Runs the LLM agent against all 3 tasks (easy, medium, hard) and emits
5
- a [START] / [END] log line for each, which the hackathon platform requires
6
- to validate that all 3 tasks have graders.
 
 
7
 
8
  Required environment variables:
9
- API_BASE_URL The API endpoint for the LLM.
10
- MODEL_NAME The model identifier.
11
- HF_TOKEN Your Hugging Face / API key.
12
- IMAGE_NAME Docker image name (default: quantum_env).
13
  """
14
 
15
  import asyncio
16
  import json
17
  import os
18
  import textwrap
19
- from typing import List, Optional
20
 
21
  from dotenv import load_dotenv
22
 
@@ -39,7 +41,7 @@ TEMPERATURE = 0.7
39
  MAX_TOKENS = 150
40
  SUCCESS_SCORE_THRESHOLD = 0.1
41
 
42
- # All 3 tasks are always evaluated this is what the platform requires
43
  ALL_TASKS = ["easy", "medium", "hard"]
44
 
45
 
@@ -48,28 +50,34 @@ SYSTEM_PROMPT = textwrap.dedent(
48
  You are an AI agent tasked with optimizing a multi-qubit quantum circuit.
49
  You will be given the current circuit as a list of gates with their index, name, and target_qubits.
50
 
51
- You have 4 possible actions you can take at any index.
52
- Action 1: Cancel identical self-inverse gates (H, X, Y, Z, CNOT, SWAP). They must be on the same qubits and not blocked by intermediate gates sharing those qubits.
53
- Action 2: Swap adjacent commuting gates (gates that operate on entirely different qubits and do not overlap).
 
54
  Action 3: Replace an H-X-H sequence on the same qubit with a Z gate.
55
  Action 4: Replace a CNOT-SWAP sequence on the same qubits with a CZ gate.
56
 
57
- You MUST output ONLY a valid JSON object with exactly two keys: 'target_index' (integer) and 'action_type' (integer 1-4).
 
58
  Example: {"target_index": 2, "action_type": 1}
59
  Do not output markdown, backticks, or any other text.
60
  """
61
  ).strip()
62
 
63
 
 
 
 
 
64
  def log_start(task: str, env: str, model: str) -> None:
65
  print(f"[START] task={task} env={env} model={model}", flush=True)
66
 
67
 
68
  def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
69
  error_val = error if error else "null"
70
- done_val = str(done).lower()
71
  print(
72
- f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
 
73
  flush=True,
74
  )
75
 
@@ -77,21 +85,24 @@ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[
77
  def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
78
  rewards_str = ",".join(f"{r:.2f}" for r in rewards)
79
  print(
80
- f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}",
 
81
  flush=True,
82
  )
83
 
84
 
 
 
 
 
85
  def build_user_prompt(step: int, circuit: list, last_reward: float, history: List[str]) -> str:
86
- if circuit:
87
- circuit_lines = [
88
  f"Index {i}: {gate.name} on qubits {gate.target_qubits}"
89
  for i, gate in enumerate(circuit)
90
- ]
91
- circuit_block = "\n".join(circuit_lines)
92
- else:
93
- circuit_block = "Empty circuit"
94
-
95
  history_block = "\n".join(history[-4:]) if history else "None"
96
  return textwrap.dedent(
97
  f"""
@@ -106,7 +117,13 @@ def build_user_prompt(step: int, circuit: list, last_reward: float, history: Lis
106
  ).strip()
107
 
108
 
109
- def get_model_action(client: OpenAI, step: int, circuit: list, last_reward: float, history: List[str]) -> str:
 
 
 
 
 
 
110
  user_prompt = build_user_prompt(step, circuit, last_reward, history)
111
  try:
112
  completion = client.chat.completions.create(
@@ -126,26 +143,32 @@ def get_model_action(client: OpenAI, step: int, circuit: list, last_reward: floa
126
  return "{}"
127
 
128
 
129
- async def run_single_task(task_name: str, env: QuantumOpenenvEnv, client: OpenAI) -> None:
 
 
 
 
 
 
 
 
130
  """
131
- Run one full episode for a given task and emit [START] / [END] log lines.
132
- The platform validates that all 3 tasks appear in these logs.
133
  """
134
  history: List[str] = []
135
  rewards: List[float] = []
136
  steps_taken = 0
137
- score = 0.0
138
  success = False
139
 
140
  try:
141
- # Reset with the specific task seed for reproducibility
142
  result = await env.reset()
143
  circuit = result.observation.circuit
144
  last_reward = 0.0
145
-
146
  initial_gate_count = len(circuit)
147
 
148
- # Infer actual task name from metadata (env may be running in random mode)
149
  actual_task = (result.observation.metadata or {}).get("task", task_name)
150
  if actual_task not in ALL_TASKS:
151
  actual_task = task_name
@@ -169,7 +192,9 @@ async def run_single_task(task_name: str, env: QuantumOpenenvEnv, client: OpenAI
169
  target_index = 0
170
  action_type = 1
171
 
172
- result = await env.step(QuantumAction(target_index=target_index, action_type=action_type))
 
 
173
  reward = result.reward or 0.0
174
  done = result.done
175
 
@@ -184,7 +209,7 @@ async def run_single_task(task_name: str, env: QuantumOpenenvEnv, client: OpenAI
184
  if done:
185
  break
186
 
187
- # Inject initial count for grader
188
  if not result.observation.metadata:
189
  result.observation.metadata = {}
190
  result.observation.metadata["initial_count"] = initial_gate_count
@@ -199,36 +224,56 @@ async def run_single_task(task_name: str, env: QuantumOpenenvEnv, client: OpenAI
199
  finally:
200
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
201
 
 
 
 
 
 
 
202
 
203
  async def main() -> None:
204
  """
205
- Run all 3 tasks sequentially.
206
 
207
- The hackathon platform requires inference.py to produce a [START] / [END]
208
- log pair for EACH of the 3 tasks (easy, medium, hard). Running only one
209
- task causes "Not enough tasks with graders" in Phase 2 Task Validation.
 
210
  """
211
  client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
 
212
 
213
  for task_name in ALL_TASKS:
214
  print(f"\n{'='*60}", flush=True)
215
- print(f"Running task: {task_name}", flush=True)
216
  print(f"{'='*60}", flush=True)
217
 
218
- # Start a fresh Docker environment instance for each task
219
- # Pass task name so the env generates the right circuit type
220
  env = await QuantumOpenenvEnv.from_docker_image(
221
  IMAGE_NAME,
222
  env_vars={"QUANTUM_TASK": task_name},
223
  )
224
  try:
225
- await run_single_task(task_name, env, client)
 
226
  finally:
227
  try:
228
  await env.close()
229
  except Exception as e:
230
  print(f"[DEBUG] env.close() error for task {task_name}: {e}", flush=True)
231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232
 
233
  if __name__ == "__main__":
234
  asyncio.run(main())
 
1
  """
2
  Inference Script
3
  ================
4
+ Runs the LLM agent against all 3 tasks (easy, medium, hard) sequentially
5
+ and prints a [START] / [END] log line for each task.
6
+
7
+ The hackathon platform requires all 3 tasks to appear in the log output
8
+ for Task Validation to pass.
9
 
10
  Required environment variables:
11
+ API_BASE_URL The API endpoint for the LLM.
12
+ MODEL_NAME The model identifier.
13
+ HF_TOKEN Your Hugging Face / API key.
14
+ IMAGE_NAME Docker image name (default: quantum_env).
15
  """
16
 
17
  import asyncio
18
  import json
19
  import os
20
  import textwrap
21
+ from typing import List, Optional, Tuple
22
 
23
  from dotenv import load_dotenv
24
 
 
41
  MAX_TOKENS = 150
42
  SUCCESS_SCORE_THRESHOLD = 0.1
43
 
44
+ # Platform requires all 3 tasks to appear in [START] log lines
45
  ALL_TASKS = ["easy", "medium", "hard"]
46
 
47
 
 
50
  You are an AI agent tasked with optimizing a multi-qubit quantum circuit.
51
  You will be given the current circuit as a list of gates with their index, name, and target_qubits.
52
 
53
+ You have 4 possible actions:
54
+ Action 1: Cancel identical self-inverse gates (H, X, Y, Z, CNOT, SWAP) on the same qubits,
55
+ not blocked by intermediate gates sharing those qubits.
56
+ Action 2: Swap adjacent commuting gates (gates on entirely different, non-overlapping qubits).
57
  Action 3: Replace an H-X-H sequence on the same qubit with a Z gate.
58
  Action 4: Replace a CNOT-SWAP sequence on the same qubits with a CZ gate.
59
 
60
+ You MUST output ONLY a valid JSON object with exactly two keys:
61
+ 'target_index' (integer) and 'action_type' (integer 1-4).
62
  Example: {"target_index": 2, "action_type": 1}
63
  Do not output markdown, backticks, or any other text.
64
  """
65
  ).strip()
66
 
67
 
68
+ # ============================================================================
69
+ # Logging (format required by hackathon platform output parser)
70
+ # ============================================================================
71
+
72
  def log_start(task: str, env: str, model: str) -> None:
73
  print(f"[START] task={task} env={env} model={model}", flush=True)
74
 
75
 
76
  def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
77
  error_val = error if error else "null"
 
78
  print(
79
+ f"[STEP] step={step} action={action} reward={reward:.2f} "
80
+ f"done={str(done).lower()} error={error_val}",
81
  flush=True,
82
  )
83
 
 
85
  def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
86
  rewards_str = ",".join(f"{r:.2f}" for r in rewards)
87
  print(
88
+ f"[END] success={str(success).lower()} steps={steps} "
89
+ f"score={score:.2f} rewards={rewards_str}",
90
  flush=True,
91
  )
92
 
93
 
94
+ # ============================================================================
95
+ # Prompt building
96
+ # ============================================================================
97
+
98
  def build_user_prompt(step: int, circuit: list, last_reward: float, history: List[str]) -> str:
99
+ circuit_block = (
100
+ "\n".join(
101
  f"Index {i}: {gate.name} on qubits {gate.target_qubits}"
102
  for i, gate in enumerate(circuit)
103
+ )
104
+ if circuit else "Empty circuit"
105
+ )
 
 
106
  history_block = "\n".join(history[-4:]) if history else "None"
107
  return textwrap.dedent(
108
  f"""
 
117
  ).strip()
118
 
119
 
120
+ def get_model_action(
121
+ client: OpenAI,
122
+ step: int,
123
+ circuit: list,
124
+ last_reward: float,
125
+ history: List[str],
126
+ ) -> str:
127
  user_prompt = build_user_prompt(step, circuit, last_reward, history)
128
  try:
129
  completion = client.chat.completions.create(
 
143
  return "{}"
144
 
145
 
146
+ # ============================================================================
147
+ # Single task episode
148
+ # ============================================================================
149
+
150
+ async def run_single_task(
151
+ task_name: str,
152
+ env: QuantumOpenenvEnv,
153
+ client: OpenAI,
154
+ ) -> Tuple[str, float, bool]:
155
  """
156
+ Run one full episode for a given task.
157
+ Returns (task_name, score, success).
158
  """
159
  history: List[str] = []
160
  rewards: List[float] = []
161
  steps_taken = 0
162
+ score = 0.01
163
  success = False
164
 
165
  try:
 
166
  result = await env.reset()
167
  circuit = result.observation.circuit
168
  last_reward = 0.0
 
169
  initial_gate_count = len(circuit)
170
 
171
+ # Resolve actual task from metadata (env may override based on QUANTUM_TASK)
172
  actual_task = (result.observation.metadata or {}).get("task", task_name)
173
  if actual_task not in ALL_TASKS:
174
  actual_task = task_name
 
192
  target_index = 0
193
  action_type = 1
194
 
195
+ result = await env.step(
196
+ QuantumAction(target_index=target_index, action_type=action_type)
197
+ )
198
  reward = result.reward or 0.0
199
  done = result.done
200
 
 
209
  if done:
210
  break
211
 
212
+ # Inject initial count so grader can compute compression ratio
213
  if not result.observation.metadata:
214
  result.observation.metadata = {}
215
  result.observation.metadata["initial_count"] = initial_gate_count
 
224
  finally:
225
  log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
226
 
227
+ return task_name, score, success
228
+
229
+
230
+ # ============================================================================
231
+ # Main: loop over all 3 tasks
232
+ # ============================================================================
233
 
234
  async def main() -> None:
235
  """
236
+ Run all 3 tasks sequentially, each in its own Docker container instance.
237
 
238
+ The hackathon platform requires:
239
+ - A [START] task=X line for each of easy, medium, hard
240
+ - A [END] score=Y line for each task
241
+ - At least 3 tasks with graders validated in the log
242
  """
243
  client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
244
+ results: List[Tuple[str, float, bool]] = []
245
 
246
  for task_name in ALL_TASKS:
247
  print(f"\n{'='*60}", flush=True)
248
+ print(f" Task: {task_name.upper()}", flush=True)
249
  print(f"{'='*60}", flush=True)
250
 
 
 
251
  env = await QuantumOpenenvEnv.from_docker_image(
252
  IMAGE_NAME,
253
  env_vars={"QUANTUM_TASK": task_name},
254
  )
255
  try:
256
+ task, score, success = await run_single_task(task_name, env, client)
257
+ results.append((task, score, success))
258
  finally:
259
  try:
260
  await env.close()
261
  except Exception as e:
262
  print(f"[DEBUG] env.close() error for task {task_name}: {e}", flush=True)
263
 
264
+ # -----------------------------------------------------------------------
265
+ # Summary table — printed at end for human reviewers in Phase 3
266
+ # -----------------------------------------------------------------------
267
+ print(f"\n{'='*60}", flush=True)
268
+ print(" BASELINE RESULTS SUMMARY", flush=True)
269
+ print(f"{'='*60}", flush=True)
270
+ print(f" {'Task':<10} {'Score':>8} {'Result'}", flush=True)
271
+ print(f" {'-'*40}", flush=True)
272
+ for task, score, success in results:
273
+ status = "PASS ✓" if success else "FAIL ✗"
274
+ print(f" {task:<10} {score:>8.3f} {status}", flush=True)
275
+ print(f"{'='*60}\n", flush=True)
276
+
277
 
278
  if __name__ == "__main__":
279
  asyncio.run(main())
server/graders.py CHANGED
@@ -2,41 +2,89 @@
2
  # All rights reserved.
3
 
4
  """
5
- Standalone graders for the Quantum Circuit Optimization Environment.
6
- Scores are strictly within (0.0, 1.0) — never exactly 0.0 or 1.0.
 
 
 
 
 
 
7
  """
8
 
9
 
10
  def _strict(score: float) -> float:
11
- """Clamp score to strictly (0.0, 1.0) as required by the platform."""
12
- return max(0.01, min(0.99, score))
13
 
14
 
15
  def grade_easy(observation) -> float:
 
 
 
 
 
 
 
16
  metadata = getattr(observation, 'metadata', {}) or {}
17
  final_count = getattr(observation, 'gate_count', 0)
18
  initial_count = metadata.get("initial_count", final_count)
 
19
  if initial_count == 0:
20
- return _strict(0.99)
 
21
  compression = (initial_count - final_count) / initial_count
22
  return _strict(compression)
23
 
24
 
25
  def grade_medium(observation) -> float:
 
 
 
 
 
 
 
 
 
26
  metadata = getattr(observation, 'metadata', {}) or {}
27
  final_count = getattr(observation, 'gate_count', 0)
28
  initial_count = metadata.get("initial_count", final_count)
 
29
  if initial_count == 0:
30
- return _strict(0.99)
 
31
  compression = (initial_count - final_count) / initial_count
32
- return _strict(compression / 0.20)
 
 
 
 
 
33
 
34
 
35
  def grade_hard(observation) -> float:
 
 
 
 
 
 
 
 
 
 
36
  metadata = getattr(observation, 'metadata', {}) or {}
37
  final_count = getattr(observation, 'gate_count', 0)
38
  initial_count = metadata.get("initial_count", final_count)
 
 
 
39
  if initial_count == 0:
40
- return _strict(0.99)
 
41
  compression = (initial_count - final_count) / initial_count
42
- return _strict(compression / 0.35)
 
 
 
 
2
  # All rights reserved.
3
 
4
  """
5
+ Graders for the Quantum Circuit Optimization Environment.
6
+
7
+ Each grader measures a different aspect of performance matching its difficulty tier:
8
+ - Easy: Pure compression ratio. Any gate removal earns proportional credit.
9
+ - Medium: Compression + bonus for using advanced identity actions (3 or 4).
10
+ - Hard: Weighted blend of compression and step efficiency. Harder threshold.
11
+
12
+ All scores are strictly within (0.01, 0.99) as required by the platform.
13
  """
14
 
15
 
16
  def _strict(score: float) -> float:
17
+ """Clamp to strictly (0.0, 1.0) platform rejects exactly 0.0 or 1.0."""
18
+ return max(0.01, min(0.99, float(score)))
19
 
20
 
21
  def grade_easy(observation) -> float:
22
+ """
23
+ Easy grader: pure compression ratio.
24
+
25
+ Score = (initial_gates - final_gates) / initial_gates
26
+ Any reduction in gate count earns proportional credit.
27
+ No bonus mechanics — agent just needs to find and cancel obvious pairs.
28
+ """
29
  metadata = getattr(observation, 'metadata', {}) or {}
30
  final_count = getattr(observation, 'gate_count', 0)
31
  initial_count = metadata.get("initial_count", final_count)
32
+
33
  if initial_count == 0:
34
+ return _strict(0.5)
35
+
36
  compression = (initial_count - final_count) / initial_count
37
  return _strict(compression)
38
 
39
 
40
  def grade_medium(observation) -> float:
41
+ """
42
+ Medium grader: compression ratio + bonus for advanced identity usage.
43
+
44
+ Score = compression_ratio + 0.15 bonus if agent used action 3 (H-X-H→Z)
45
+ or action 4 (CNOT-SWAP→CZ) at least once during the episode.
46
+
47
+ This rewards agents that discover algebraic identities beyond simple
48
+ gate cancellation — a meaningfully harder skill than the easy task.
49
+ """
50
  metadata = getattr(observation, 'metadata', {}) or {}
51
  final_count = getattr(observation, 'gate_count', 0)
52
  initial_count = metadata.get("initial_count", final_count)
53
+
54
  if initial_count == 0:
55
+ return _strict(0.5)
56
+
57
  compression = (initial_count - final_count) / initial_count
58
+
59
+ # Bonus for using advanced identity actions (tracked in metadata by environment)
60
+ used_advanced = metadata.get("used_advanced_actions", False)
61
+ bonus = 0.15 if used_advanced else 0.0
62
+
63
+ return _strict(compression + bonus)
64
 
65
 
66
  def grade_hard(observation) -> float:
67
+ """
68
+ Hard grader: weighted blend of compression efficiency and step efficiency.
69
+
70
+ Score = 0.7 * compression_ratio + 0.3 * step_efficiency
71
+ where step_efficiency = 1 - (steps_taken / max_steps)
72
+
73
+ This penalises agents that compress the circuit but waste many steps —
74
+ exactly the behaviour frontier models exhibit on hard tasks
75
+ (thrashing with invalid swaps before finding cancellations).
76
+ """
77
  metadata = getattr(observation, 'metadata', {}) or {}
78
  final_count = getattr(observation, 'gate_count', 0)
79
  initial_count = metadata.get("initial_count", final_count)
80
+ steps_taken = metadata.get("step", 1)
81
+ max_steps = 150
82
+
83
  if initial_count == 0:
84
+ return _strict(0.5)
85
+
86
  compression = (initial_count - final_count) / initial_count
87
+ step_efficiency = max(0.0, 1.0 - (steps_taken / max_steps))
88
+
89
+ score = 0.7 * compression + 0.3 * step_efficiency
90
+ return _strict(score)
server/quantum_openenv_env_environment.py CHANGED
@@ -12,13 +12,16 @@ Architecture:
12
  - Instance-isolated PRNG (seeding) for strict reproducibility in server environments.
13
  - Relative Compression Grading: Evaluates agents on compression ratio rather than
14
  an absolute theoretical minimum, mirroring real-world NP-Hard quantum optimization constraints.
 
 
15
  """
16
 
 
17
  import random
18
  from uuid import uuid4
19
 
20
  from openenv.core.env_server.interfaces import Environment
21
- from openenv.core.env_server.types import State
22
 
23
  from quantum_openenv_env.models import QuantumAction, QuantumGate, QuantumObservation
24
 
@@ -77,17 +80,15 @@ TASKS = ["easy", "medium", "hard"]
77
 
78
 
79
  # ============================================================================
80
- # Standalone graders (used by graders.py and inference.py)
81
  # ============================================================================
82
 
83
- from quantum_openenv_env.server.graders import grade_easy as _grade_easy_fn
84
- from quantum_openenv_env.server.graders import grade_medium as _grade_medium_fn
85
- from quantum_openenv_env.server.graders import grade_hard as _grade_hard_fn
86
 
87
  GRADERS = {
88
- "easy": _grade_easy_fn,
89
- "medium": _grade_medium_fn,
90
- "hard": _grade_hard_fn,
91
  }
92
 
93
 
@@ -102,20 +103,38 @@ class QuantumCircuitOptimizationEnvironment(Environment):
102
  The agent acts as a quantum compiler, reducing circuit depth by applying
103
  mathematical identities and commutativity rules across 3 difficulty tiers.
104
 
 
 
 
 
 
 
 
 
105
  Action types:
106
- 1 - Cancel identical self-inverse gate pairs
107
- 2 - Swap adjacent commuting gates (different qubits)
108
- 3 - Replace H-X-H sequence with Z gate
109
- 4 - Replace CNOT-SWAP sequence with CZ gate
 
110
  """
111
 
112
  SUPPORTS_CONCURRENT_SESSIONS: bool = True
113
- SELF_INVERSE_GATES = {"H", "X", "Y", "Z", "CNOT", "CX", "CZ", "SWAP", "CCX", "TOFFOLI", "CSWAP", "FREDKIN"}
 
 
 
114
 
115
  def __init__(self, task: str = "random", seed: int = None):
 
 
 
 
116
  self.mode = task
117
  if self.mode != "random" and self.mode not in TASK_CONFIGS:
118
- raise ValueError(f"Unknown task: {task}. Must be 'random' or one of {list(TASK_CONFIGS.keys())}")
 
 
119
 
120
  self._state = State(episode_id=str(uuid4()), step_count=0)
121
  self._reset_count = 0
@@ -126,11 +145,21 @@ class QuantumCircuitOptimizationEnvironment(Environment):
126
  self.task_config = TASK_CONFIGS["easy"]
127
  self._circuit: list[QuantumGate] = []
128
  self._initial_gate_count = 0
 
 
 
 
 
129
 
130
- def reset(self) -> QuantumObservation:
131
  """Reset the environment to a fresh circuit for the configured task."""
132
  self._state = State(episode_id=str(uuid4()), step_count=0)
133
  self._reset_count += 1
 
 
 
 
 
134
 
135
  if self.mode == "random":
136
  self.task_name = self.rng.choice(TASKS)
@@ -152,10 +181,11 @@ class QuantumCircuitOptimizationEnvironment(Environment):
152
  "reset_count": self._reset_count,
153
  "initial_count": self._initial_gate_count,
154
  "seed": self.current_seed,
 
155
  },
156
  )
157
 
158
- def step(self, action: QuantumAction) -> QuantumObservation: # type: ignore[override]
159
  """Execute one action in the environment."""
160
  self._state.step_count += 1
161
  target_index = action.target_index
@@ -170,7 +200,9 @@ class QuantumCircuitOptimizationEnvironment(Environment):
170
  gate_at_index = self._circuit[target_index]
171
  active_qubits = set(gate_at_index.target_qubits)
172
 
 
173
  # ACTION 1: Cancel Identical Self-Inverse Gates
 
174
  if action_type == 1:
175
  next_gate_index = None
176
  for j in range(target_index + 1, len(self._circuit)):
@@ -188,7 +220,9 @@ class QuantumCircuitOptimizationEnvironment(Environment):
188
  reward = 1.0
189
  action_result = "cancelled_identical"
190
 
 
191
  # ACTION 2: Swap Commuting Gates
 
192
  elif action_type == 2:
193
  if target_index + 1 < len(self._circuit):
194
  next_gate = self._circuit[target_index + 1]
@@ -201,75 +235,105 @@ class QuantumCircuitOptimizationEnvironment(Environment):
201
  reward = -0.05
202
  action_result = "swapped_commuting"
203
 
204
- # ACTION 3: Replace H-X-H with Z
 
 
205
  elif action_type == 3:
206
  if target_index + 2 < len(self._circuit):
207
  g1 = self._circuit[target_index]
208
  g2 = self._circuit[target_index + 1]
209
  g3 = self._circuit[target_index + 2]
 
210
  if (g1.name == "H" and g2.name == "X" and g3.name == "H" and
211
  g1.target_qubits == g2.target_qubits == g3.target_qubits):
212
  self._circuit.pop(target_index + 2)
213
  self._circuit.pop(target_index + 1)
214
- self._circuit[target_index] = QuantumGate(name="Z", target_qubits=g1.target_qubits)
 
 
215
  reward = 2.0
216
  action_result = "identity_hxh_to_z"
 
217
 
218
- # ACTION 4: Replace CNOT-SWAP with CZ
 
 
219
  elif action_type == 4:
220
  if target_index + 1 < len(self._circuit):
221
  g1 = self._circuit[target_index]
222
  g2 = self._circuit[target_index + 1]
 
223
  if (g1.name == "CNOT" and g2.name == "SWAP" and
224
  set(g1.target_qubits) == set(g2.target_qubits)):
225
  self._circuit.pop(target_index + 1)
226
- self._circuit[target_index] = QuantumGate(name="CZ", target_qubits=g1.target_qubits)
 
 
227
  reward = 1.0
228
  action_result = "identity_cnot_swap_to_cz"
 
229
 
230
  return self._build_observation(reward, action_result)
231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232
  # ============================================================================
233
- # Grader Methods (OpenEnv validator calls these on the environment instance)
234
- # Each grades the CURRENT internal circuit state — no arguments needed.
235
  # ============================================================================
236
 
 
 
 
 
 
237
  def grade_easy(self) -> float:
238
- """
239
- Grader for Easy Task.
240
- Pure compression ratio — any reduction in gate count earns proportional score.
241
- """
242
  if self._initial_gate_count == 0:
243
- return 1.0
244
- final_count = len(self._circuit)
245
- compression = (self._initial_gate_count - final_count) / self._initial_gate_count
246
- return max(0.0, min(1.0, compression))
247
 
248
  def grade_medium(self) -> float:
249
- """
250
- Grader for Medium Task.
251
- Scaled so that 20% compression = full score (1.0).
252
- Partial credit below threshold encourages progress.
253
- """
254
  if self._initial_gate_count == 0:
255
- return 1.0
256
- final_count = len(self._circuit)
257
- compression = (self._initial_gate_count - final_count) / self._initial_gate_count
258
- scaled = compression / 0.20
259
- return max(0.0, min(1.0, scaled))
260
 
261
  def grade_hard(self) -> float:
262
- """
263
- Grader for Hard Task.
264
- Scaled so that 35% compression = full score (1.0).
265
- Harder threshold reflects genuine difficulty of deep entangled circuits.
266
- """
267
  if self._initial_gate_count == 0:
268
- return 1.0
269
- final_count = len(self._circuit)
270
- compression = (self._initial_gate_count - final_count) / self._initial_gate_count
271
- scaled = compression / 0.35
272
- return max(0.0, min(1.0, scaled))
 
 
 
 
 
 
273
 
274
  # ============================================================================
275
  # Internal helpers
@@ -291,6 +355,7 @@ class QuantumCircuitOptimizationEnvironment(Environment):
291
  "step": self._state.step_count,
292
  "initial_count": self._initial_gate_count,
293
  "seed": self.current_seed,
 
294
  },
295
  )
296
 
@@ -298,6 +363,7 @@ class QuantumCircuitOptimizationEnvironment(Environment):
298
  if len(self._circuit) == 0:
299
  return True
300
 
 
301
  for i in range(len(self._circuit)):
302
  curr_gate = self._circuit[i]
303
  active_qubits = set(curr_gate.target_qubits)
@@ -311,22 +377,10 @@ class QuantumCircuitOptimizationEnvironment(Environment):
311
  return False
312
  break
313
 
 
314
  for i in range(len(self._circuit) - 1):
315
  if not set(self._circuit[i].target_qubits).intersection(
316
  set(self._circuit[i + 1].target_qubits)):
317
  return False
318
 
319
- return True
320
-
321
- def grade(self) -> float:
322
- """Grade current state using the active task's grader."""
323
- grader_method = {
324
- "easy": self.grade_easy,
325
- "medium": self.grade_medium,
326
- "hard": self.grade_hard,
327
- }[self.task_name]
328
- return grader_method()
329
-
330
- @property
331
- def state(self) -> State:
332
- return self._state
 
12
  - Instance-isolated PRNG (seeding) for strict reproducibility in server environments.
13
  - Relative Compression Grading: Evaluates agents on compression ratio rather than
14
  an absolute theoretical minimum, mirroring real-world NP-Hard quantum optimization constraints.
15
+ - Advanced action tracking: medium/hard graders reward agents that discover
16
+ algebraic identities (H-X-H=Z, CNOT-SWAP=CZ) beyond simple cancellations.
17
  """
18
 
19
+ import os
20
  import random
21
  from uuid import uuid4
22
 
23
  from openenv.core.env_server.interfaces import Environment
24
+ from openenv.core.env_server.types import EnvironmentMetadata, State
25
 
26
  from quantum_openenv_env.models import QuantumAction, QuantumGate, QuantumObservation
27
 
 
80
 
81
 
82
  # ============================================================================
83
+ # Graders (imported from graders.py)
84
  # ============================================================================
85
 
86
+ from quantum_openenv_env.server.graders import grade_easy, grade_medium, grade_hard
 
 
87
 
88
  GRADERS = {
89
+ "easy": grade_easy,
90
+ "medium": grade_medium,
91
+ "hard": grade_hard,
92
  }
93
 
94
 
 
103
  The agent acts as a quantum compiler, reducing circuit depth by applying
104
  mathematical identities and commutativity rules across 3 difficulty tiers.
105
 
106
+ Observation:
107
+ circuit - Current list of QuantumGate objects
108
+ gate_count - Number of gates remaining
109
+ num_qubits - System qubit count
110
+ done - Episode terminal flag
111
+ reward - Last step reward
112
+ metadata - task, initial_count, step, seed, used_advanced_actions
113
+
114
  Action types:
115
+ 1 - Cancel identical self-inverse gate pairs (+1.0)
116
+ 2 - Swap adjacent commuting gates (different qubits) (-0.05)
117
+ 3 - Replace H-X-H sequence with Z gate (+2.0)
118
+ 4 - Replace CNOT-SWAP sequence with CZ gate (+1.0)
119
+ Invalid actions (-0.1)
120
  """
121
 
122
  SUPPORTS_CONCURRENT_SESSIONS: bool = True
123
+ SELF_INVERSE_GATES = {
124
+ "H", "X", "Y", "Z", "CNOT", "CX", "CZ", "SWAP",
125
+ "CCX", "TOFFOLI", "CSWAP", "FREDKIN"
126
+ }
127
 
128
  def __init__(self, task: str = "random", seed: int = None):
129
+ # Also read from environment variable so Docker env_vars work
130
+ if task == "random":
131
+ task = os.getenv("QUANTUM_TASK", "random")
132
+
133
  self.mode = task
134
  if self.mode != "random" and self.mode not in TASK_CONFIGS:
135
+ raise ValueError(
136
+ f"Unknown task: {task}. Must be 'random' or one of {list(TASK_CONFIGS.keys())}"
137
+ )
138
 
139
  self._state = State(episode_id=str(uuid4()), step_count=0)
140
  self._reset_count = 0
 
145
  self.task_config = TASK_CONFIGS["easy"]
146
  self._circuit: list[QuantumGate] = []
147
  self._initial_gate_count = 0
148
+ self._used_advanced_actions = False # tracks action 3 or 4 usage this episode
149
+
150
+ # ============================================================================
151
+ # OpenEnv API
152
+ # ============================================================================
153
 
154
+ def reset(self, seed: int = None, **kwargs) -> QuantumObservation:
155
  """Reset the environment to a fresh circuit for the configured task."""
156
  self._state = State(episode_id=str(uuid4()), step_count=0)
157
  self._reset_count += 1
158
+ self._used_advanced_actions = False
159
+
160
+ if seed is not None:
161
+ self.current_seed = seed
162
+ self.rng = random.Random(self.current_seed)
163
 
164
  if self.mode == "random":
165
  self.task_name = self.rng.choice(TASKS)
 
181
  "reset_count": self._reset_count,
182
  "initial_count": self._initial_gate_count,
183
  "seed": self.current_seed,
184
+ "used_advanced_actions": False,
185
  },
186
  )
187
 
188
+ def step(self, action: QuantumAction, **kwargs) -> QuantumObservation: # type: ignore[override]
189
  """Execute one action in the environment."""
190
  self._state.step_count += 1
191
  target_index = action.target_index
 
200
  gate_at_index = self._circuit[target_index]
201
  active_qubits = set(gate_at_index.target_qubits)
202
 
203
+ # ------------------------------------------------------------------
204
  # ACTION 1: Cancel Identical Self-Inverse Gates
205
+ # ------------------------------------------------------------------
206
  if action_type == 1:
207
  next_gate_index = None
208
  for j in range(target_index + 1, len(self._circuit)):
 
220
  reward = 1.0
221
  action_result = "cancelled_identical"
222
 
223
+ # ------------------------------------------------------------------
224
  # ACTION 2: Swap Commuting Gates
225
+ # ------------------------------------------------------------------
226
  elif action_type == 2:
227
  if target_index + 1 < len(self._circuit):
228
  next_gate = self._circuit[target_index + 1]
 
235
  reward = -0.05
236
  action_result = "swapped_commuting"
237
 
238
+ # ------------------------------------------------------------------
239
+ # ACTION 3: Replace H-X-H with Z (advanced identity)
240
+ # ------------------------------------------------------------------
241
  elif action_type == 3:
242
  if target_index + 2 < len(self._circuit):
243
  g1 = self._circuit[target_index]
244
  g2 = self._circuit[target_index + 1]
245
  g3 = self._circuit[target_index + 2]
246
+
247
  if (g1.name == "H" and g2.name == "X" and g3.name == "H" and
248
  g1.target_qubits == g2.target_qubits == g3.target_qubits):
249
  self._circuit.pop(target_index + 2)
250
  self._circuit.pop(target_index + 1)
251
+ self._circuit[target_index] = QuantumGate(
252
+ name="Z", target_qubits=g1.target_qubits
253
+ )
254
  reward = 2.0
255
  action_result = "identity_hxh_to_z"
256
+ self._used_advanced_actions = True # track for medium grader
257
 
258
+ # ------------------------------------------------------------------
259
+ # ACTION 4: Replace CNOT-SWAP with CZ (advanced identity)
260
+ # ------------------------------------------------------------------
261
  elif action_type == 4:
262
  if target_index + 1 < len(self._circuit):
263
  g1 = self._circuit[target_index]
264
  g2 = self._circuit[target_index + 1]
265
+
266
  if (g1.name == "CNOT" and g2.name == "SWAP" and
267
  set(g1.target_qubits) == set(g2.target_qubits)):
268
  self._circuit.pop(target_index + 1)
269
+ self._circuit[target_index] = QuantumGate(
270
+ name="CZ", target_qubits=g1.target_qubits
271
+ )
272
  reward = 1.0
273
  action_result = "identity_cnot_swap_to_cz"
274
+ self._used_advanced_actions = True # track for medium grader
275
 
276
  return self._build_observation(reward, action_result)
277
 
278
+ @property
279
+ def state(self) -> State:
280
+ return self._state
281
+
282
+ def get_metadata(self) -> EnvironmentMetadata:
283
+ """
284
+ Return human-readable metadata shown in the HF Space web UI and
285
+ consumed by the platform's agent during Phase 2 evaluation.
286
+ """
287
+ return EnvironmentMetadata(
288
+ name="Quantum Circuit Optimizer",
289
+ description=(
290
+ "RL environment where an agent acts as a quantum compiler, "
291
+ "reducing circuit depth by applying gate cancellation, "
292
+ "commutativity swaps, and algebraic identities "
293
+ "(H·X·H = Z, CNOT·SWAP = CZ) across 3 difficulty tiers "
294
+ "(2-qubit easy → 4-qubit medium → 6-qubit hard with deep entanglement)."
295
+ ),
296
+ version="0.1.0",
297
+ )
298
+
299
  # ============================================================================
300
+ # Grader methods (called by OpenEnv validator on the environment instance)
 
301
  # ============================================================================
302
 
303
+ @staticmethod
304
+ def _strict(score: float) -> float:
305
+ """Clamp to strictly (0.0, 1.0) — platform rejects exactly 0.0 or 1.0."""
306
+ return max(0.01, min(0.99, float(score)))
307
+
308
  def grade_easy(self) -> float:
309
+ """Pure compression ratio — any gate removal earns proportional credit."""
 
 
 
310
  if self._initial_gate_count == 0:
311
+ return self._strict(0.5)
312
+ compression = (self._initial_gate_count - len(self._circuit)) / self._initial_gate_count
313
+ return self._strict(compression)
 
314
 
315
  def grade_medium(self) -> float:
316
+ """Compression ratio + 0.15 bonus for using advanced identity actions."""
 
 
 
 
317
  if self._initial_gate_count == 0:
318
+ return self._strict(0.5)
319
+ compression = (self._initial_gate_count - len(self._circuit)) / self._initial_gate_count
320
+ bonus = 0.15 if self._used_advanced_actions else 0.0
321
+ return self._strict(compression + bonus)
 
322
 
323
  def grade_hard(self) -> float:
324
+ """Weighted blend: 70% compression + 30% step efficiency."""
 
 
 
 
325
  if self._initial_gate_count == 0:
326
+ return self._strict(0.5)
327
+ compression = (self._initial_gate_count - len(self._circuit)) / self._initial_gate_count
328
+ step_efficiency = max(0.0, 1.0 - (self._state.step_count / 150))
329
+ score = 0.7 * compression + 0.3 * step_efficiency
330
+ return self._strict(score)
331
+
332
+ def grade(self) -> float:
333
+ """Grade current state using the active task's grader method."""
334
+ return {"easy": self.grade_easy, "medium": self.grade_medium, "hard": self.grade_hard}[
335
+ self.task_name
336
+ ]()
337
 
338
  # ============================================================================
339
  # Internal helpers
 
355
  "step": self._state.step_count,
356
  "initial_count": self._initial_gate_count,
357
  "seed": self.current_seed,
358
+ "used_advanced_actions": self._used_advanced_actions,
359
  },
360
  )
361
 
 
363
  if len(self._circuit) == 0:
364
  return True
365
 
366
+ # Check for any valid cancellation
367
  for i in range(len(self._circuit)):
368
  curr_gate = self._circuit[i]
369
  active_qubits = set(curr_gate.target_qubits)
 
377
  return False
378
  break
379
 
380
+ # Check for any valid swap
381
  for i in range(len(self._circuit) - 1):
382
  if not set(self._circuit[i].target_qubits).intersection(
383
  set(self._circuit[i + 1].target_qubits)):
384
  return False
385
 
386
+ return True