Spaces:
Sleeping
title: Quantum Circuit Optimizer
emoji: π
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 8000
pinned: false
π Quantum Circuit Optimization Environment
An advanced, physics-grounded Reinforcement Learning environment for the Meta OpenEnv Hackathon.
Challenge agents to optimize multi-qubit quantum circuits through mathematical identities and commutativity rules.
Key Features
- NP-Hard Problem Space: Moves beyond static text puzzles into multi-dimensional spatial reasoning that cannot be solved by memorization.
- Deterministic Reproducibility: Fully supports episode seeding via
random.Random(seed). The same seed always produces the same circuit, guaranteeing reproducible evaluation across model runs. - Three Genuinely Differentiated Graders: Each difficulty tier measures a different skill β pure compression on easy, identity-discovery bonus on medium, and a strict 5% compression gate on hard. Grading math lives in a single file (
graders.py) with no duplication. - Web UI Playground: The
promptfield in every observation renders a human-readable circuit view directly in the HuggingFace Space interface.
Motivation: The Quantum Circuit Optimization
In the real world, quantum computers suffer from rapid decoherence. Every quantum gate introduces noise, so shorter circuits yield higher-fidelity results. However, optimal quantum circuit compression is an NP-Hard problem.
While traditional frameworks like Qiskit, Cirq, and tket rely on hardcoded human heuristics to identify redundant gates and exploit commutativity, this environment turns that exact physics problem into a rigorous testing ground for Artificial Intelligence. It is designed to evaluate whether RL and LLM agents can independently learn and execute these compiler heuristics from scratch.
Current LLM benchmarks rely on static toy puzzles. This environment bridges the gap by requiring agents to generalize real-world quantum physics rules β such as swapping spatially separated, commuting gates to bring distant self-inverse identities together. Memorization is impossible; agents must dynamically reason about multi-dimensional spatial gate layouts and plan over long horizons.
Environment Specifications
Observation Space
| Field | Type | Description |
|---|---|---|
circuit |
List[Gate] |
Current gate sequence. Each gate has a name (e.g. "H", "CNOT", "SWAP") and target_qubits. |
gate_count |
int |
Current number of gates remaining in the circuit. |
num_qubits |
int |
Total number of qubits in the system. |
done |
bool |
True when fully optimized, dead-ended, or step limit (150) reached. |
reward |
float |
Reward from the previous action. |
prompt |
str |
Human-readable circuit state for the web UI playground. |
metadata |
dict |
Episode tracking data β see breakdown below. |
Metadata Fields
| Key | Type | Description |
|---|---|---|
task |
str |
Active task: "easy", "medium", or "hard". |
initial_count |
int |
Gate count at episode start. Used by all graders to compute compression ratio. |
step |
int |
Current step number within the episode. |
seed |
int | None |
RNG seed used to generate this circuit. Pass the same value to reset() to reproduce it. |
used_advanced_actions |
bool |
True if the agent used action 3 (H-X-H β Z) or action 4 (CNOT-SWAP β CZ) this episode. Used by the medium grader bonus. |
Action Space
| Field | Type | Description |
|---|---|---|
target_index |
int β₯ 0 & < num_qubits |
Index of the primary gate to act on. Validated: must be non-negative. |
action_type |
int [1β4] |
Quantum physics rule to apply. Validated: must be between 1 and 4 inclusive. |
Available Action Types
| ID | Name | Description | Reward |
|---|---|---|---|
1 |
Cancel Identical Gates | Removes self-inverse gate pairs (XΒ·X=I, HΒ·H=I, CNOTΒ·CNOT=I) on the same qubits, not blocked by overlapping intermediate gates. | +1.0 |
2 |
Swap Commuting Gates | Swaps the target gate with the next adjacent gate only if their qubit sets do not intersect. Enables bringing distant cancellable pairs together. | -0.05 |
3 |
H-X-H Identity Collapse | Replaces a H β X β H sequence on the same qubit with a single Z gate (net: 2 gates removed). |
+2.0 |
4 |
Entanglement Compression | Replaces a CNOT(a,b) β CNOT(b,a) β CNOT(a,b) sequence with a single SWAP gate β a standard compiler identity (net: 2 gates removed). |
+2.0 |
Invalid actions (out-of-bounds index, illegal non-commuting swap, pattern not present) incur a
-0.10penalty. Circuit state remains unchanged.
Tasks & Difficulty Levels
| Task | Qubits | Pair Gates | Noise Gates | Approx. Initial Gates | Entanglement |
|---|---|---|---|---|---|
easy |
2 | 8 pairs | 4 noise | ~20 | None (single-qubit only) |
medium |
4 | 12 pairs | 8 noise | ~32 | Low (CNOT, SWAP) |
hard |
6 | 25 pairs | 20 noise | ~70 | High (deep entanglement) |
Set QUANTUM_TASK=random to have the environment randomly select a difficulty tier on each reset().
Grader & Evaluation
Grading math lives exclusively in server/graders.py. The environment class methods grade_easy, grade_medium, grade_hard are thin one-line delegates β no duplicated math anywhere. All scores are strictly within (0.01, 0.99).
| Task | Formula | Full Score Requires |
|---|---|---|
| Easy | score = (initial β final) / initial |
Any consistent gate removal earns proportional credit. |
| Medium | score = compression + 0.15 if used_advanced_actions, else compression |
Gate removal and discovering at least one algebraic identity (action 3 or 4). |
| Hard | score = compression if compression β₯ 0.05, else 0.01 |
Must remove at least 5% of gates. No step-efficiency bonus β raw gate reduction only. |
Why hard uses a strict 5% gate instead of step efficiency:
Testing showed a step-efficiency formula inflated hard scores above medium β agents that thrashed with invalid actions but accidentally landed 1β2 cancellations near the step limit scored ~0.30 from efficiency alone with zero real compression. The 5% gate ensures hard genuinely requires sustained effort.
Baseline Scores
All runs use fixed seeds so the starting circuit is identical across evaluations. Temperature 0.7 reflects real-world model behavior. The variance across 3 runs shows statistical stability of the difficulty progression.
Run configuration:
| Parameter | Value |
|---|---|
| Model | llama-3.1-8b-instant (Groq API) |
| Temperature | 0.7 |
| Max steps per episode | 15 |
| Runs per task | 3 |
| Seeds | easy=42, medium=7, hard=13 |
| Success threshold | score β₯ 0.10 |
Results:
| Task | Avg Score | Min | Max | Result | Notes |
|---|---|---|---|---|---|
easy |
0.167 | 0.100 | 0.200 | Pass | Model reliably finds local self-inverse cancellations on the 2-qubit circuit. |
medium |
0.028 | 0.010 | 0.062 | Fail | Struggles with blocker gates; rarely discovers H-X-H or CNOT-SWAP identities needed for the bonus. |
hard |
0.010 | 0.010 | 0.010 | Fail | Consistently scores the floor β cannot achieve 5% compression on the 70-gate 6-qubit circuit in 15 steps. |
Interpretation: The difficulty progression is correct. Easy is solvable zero-shot. Medium requires identity discovery which small models rarely attempt. Hard requires sustained multi-step planning that is beyond current zero-shot LLM capability within a limited step budget. Advanced scaffolding or RL fine-tuning is required for reliable medium/hard performance.
To reproduce these results:
# Set in your .env:
# OPENAI_API_KEY=your_groq_api_key
# API_BASE_URL=https://api.groq.com/openai/v1
# MODEL_NAME=llama-3.1-8b-instant
# IMAGE_NAME=quantum_env
uv run python inference.py
Setup and Usage Instructions
1. Prerequisites
pip install openenv-core
uv sync
2. Environment Variables
Create a .env file in the root directory:
# Standard name used by the evaluation platform
OPENAI_API_KEY="your_api_key_here"
API_BASE_URL="https://api.groq.com/openai/v1"
MODEL_NAME="llama-3.1-8b-instant"
# Environment settings
QUANTUM_TASK="random"
IMAGE_NAME="quantum_env"
# Optional HF fallback
HF_TOKEN="your_hf_token"
| Variable | Description |
|---|---|
OPENAI_API_KEY |
API key β injected by the platform during evaluation; use your own for local testing |
API_BASE_URL |
Inference endpoint (Groq, HF router, or OpenAI compatible) |
MODEL_NAME |
Model identifier |
QUANTUM_TASK |
Task: easy, medium, hard, or random |
IMAGE_NAME |
Docker image name for the environment server |
3. Build & Validate
docker build -t quantum_env .
openenv validate .
4. Run Inference (all 3 tasks)
uv run python inference.py
The script runs easy β medium β hard, 3 episodes each with fixed seeds per task, and prints a full results summary at the end. All 3 tasks are always evaluated.
5. Reproducing Baseline via Seed
# Same seed always produces the same initial circuit
result = await env.reset(seed=42)
The environment uses random.Random(seed) per instance β fully isolated, safe for concurrent WebSocket sessions.
Project Structure
.
βββ server/
β βββ __init__.py
β βββ app.py # FastAPI server β reads QUANTUM_TASK from env var
β βββ graders.py # Single source of truth for all grading math
β βββ quantum_openenv_env_environment.py # Core environment + thin grader delegates
βββ __init__.py
βββ client.py # OpenEnv WebSocket client
βββ models.py # Typed Pydantic models (action: ge=0, le=4)
βββ inference.py # Baseline LLM inference script (all 3 tasks)
βββ openenv.yaml # OpenEnv spec manifest
βββ Dockerfile # Container definition
βββ pyproject.toml
βββ README.md
License
This project is released under the MIT license found in the LICENSE file.