# 🌌 QSBench: Complete User Guide

Welcome to the **QSBench Analytics Hub**.  
This platform is designed to bridge the gap between quantum circuit topology and machine learning, allowing researchers to study how structural characteristics influence quantum simulation outcomes.

---

## ⚠️ Important: Demo Dataset Notice

The datasets currently loaded in this hub are **v1.0.0-demo versions**.

- **Scale**: These are small *shards* (subsets) of the full QSBench library.  
- **Accuracy**: Because the training data is limited in size, ML models trained here will show lower accuracy and higher variance compared to models trained on full-scale production datasets.  
- **Purpose**: These sets are intended for **demonstration and prototyping** of analytical pipelines before moving to high-performance computing (HPC) environments.  

---

## 📂 1. Dataset Architecture & Selection

QSBench provides high-fidelity simulation data for the Quantum Machine Learning (QML) community.  
We provide four distinct environments to test how different noise models affect data:

### Core (Clean)
Ideal state-vector simulations.  
Used as a **"Golden Reference"** to understand the theoretical limits of a circuit's expressivity without physical interference.

### Depolarizing Noise
Simulates the effect of qubits losing their state toward a maximally mixed state.  
This is the standard **"white noise"** of quantum computing.

### Amplitude Damping
Represents **T1 relaxation (energy loss)**.  
This is an asymmetric noise model where qubits decay from ∣1⟩ to ∣0⟩, critical for studying superconducting hardware.

### Transpilation (10q)
Circuits are mapped to a **hardware topology (heavy-hex or grid)**.  
Used to study how SWAP gates and routing overhead affect final results.

---

## 📊 2. Feature Engineering: Structural Metrics

Why do we extract these specific features?  
In QML, the **structure ("shape") of a circuit directly impacts performance**.

- **gate_entropy**  
  Measures distribution of gates.  
  High entropy → complex, less repetitive circuits → harder for classical models to learn.

- **meyer_wallach**  
  Quantifies **global entanglement**.  
  Entanglement provides quantum advantage but increases sensitivity to noise.

- **adjacency**  
  Represents qubit interaction graph density.  
  High adjacency → faster information spread, but higher risk of cross-talk errors.

- **cx_count (Two-Qubit Gates)**  
  The most critical complexity metric.  
  On NISQ devices, CNOT gates are **10x–100x noisier** than single-qubit gates.

**Note on Feature Correlation:** While structural metrics (like `gate_entropy` or `depth`) describe the complexity of the circuit, they do not encode the specific rotation angles of individual gates. 
Therefore, predicting the exact expectation value using only structural features is an **extremely challenging task** (Non-Trivial Mapping).

---

## 🎯 3. Multi-Target Regression (The Bloch Vector)

Unlike traditional benchmarks that focus on a single observable, QSBench targets the **full global Bloch vector**:

[⟨X⟩global, ⟨Y⟩global, ⟨Z⟩global]

```text
       | +Z (0)
       |
  -----|---- +Y
      /|
     / | -Z (1)
   +X
```

---

### Why predict all three?

A quantum state is a point on (or inside) the **Bloch sphere**.

- Predicting only Z gives an incomplete picture  
- Multi-target regression learns correlations between:
  - circuit structure  
  - full quantum state orientation  
  - behavior in Hilbert space  

---

## 🤖 4. Using the ML Analytics Module

The Hub uses a **Random Forest Regressor** to establish a baseline of predictability.

### Workflow

1. **Select Dataset**  
   Choose a noise model and observe how it affects predictability.

2. **Select Features**  
   Recommended starting set:
   - `gate_entropy`
   - `meyer_wallach`
   - `depth`
   - `cx_count`

3. **Execute Baseline**  
   Performs an **80/20 train-test split**.

4. **Analyze the Triple Parity Plot**

   - 🔴 **Diagonal Red Line** → perfect prediction  
   - 📈 **Clustering near line** → strong predictive signal  
   - 🔍 **Basis comparison**:
     - Z often easier to predict  
     - X/Y depend more on circuit structure  
     - reveals architectural biases (HEA vs QFT, etc.)

📉 **How to Interpret "Bad" Metrics?**

If you see a **negative** R2 or clustering around zero, don't panic. This is the expected behavior for standard regression on quantum data:

- **Mean Predictor Baseline:** In complex circuits (n=8, depth=6), expectation values naturally concentrate around 0. A model that simply predicts "0" for everything will have a low MAE but a zero/negative R2.

- **The Complexity Gap:** A negative R2 proves that the relationship between circuit shape and quantum output is highly non-linear.

- **Research Challenge:** Use these baseline results to justify the need for more advanced architectures like **Graph Neural Networks (GNNs)** or **Recursive Quantum Filters** that can process the gate sequence itself.

---

## 🔗 5. Project Resources

- 🤗 Hugging Face Datasets — download dataset shards  
- 💻 GitHub Repository — QSBench generator source code  
- 🌐 Official Website — documentation and benchmarking leaderboards  

---

*QSBench — Synthetic Quantum Dataset Benchmarks*