QSBench's picture
Update GUIDE.md
ab598e2 verified
# 🌌 QSBench: Complete User Guide
Welcome to the **QSBench Analytics Hub**.
This platform is designed to bridge the gap between quantum circuit topology and machine learning, allowing researchers to study how structural characteristics influence quantum simulation outcomes.
---
## ⚠️ Important: Demo Dataset Notice
The datasets currently loaded in this hub are **v1.0.0-demo versions**.
- **Scale**: These are small *shards* (subsets) of the full QSBench library.
- **Accuracy**: Because the training data is limited in size, ML models trained here will show lower accuracy and higher variance compared to models trained on full-scale production datasets.
- **Purpose**: These sets are intended for **demonstration and prototyping** of analytical pipelines before moving to high-performance computing (HPC) environments.
---
## πŸ“‚ 1. Dataset Architecture & Selection
QSBench provides high-fidelity simulation data for the Quantum Machine Learning (QML) community.
We provide four distinct environments to test how different noise models affect data:
### Core (Clean)
Ideal state-vector simulations.
Used as a **"Golden Reference"** to understand the theoretical limits of a circuit's expressivity without physical interference.
### Depolarizing Noise
Simulates the effect of qubits losing their state toward a maximally mixed state.
This is the standard **"white noise"** of quantum computing.
### Amplitude Damping
Represents **T1 relaxation (energy loss)**.
This is an asymmetric noise model where qubits decay from ∣1⟩ to ∣0⟩, critical for studying superconducting hardware.
### Transpilation (10q)
Circuits are mapped to a **hardware topology (heavy-hex or grid)**.
Used to study how SWAP gates and routing overhead affect final results.
---
## πŸ“Š 2. Feature Engineering: Structural Metrics
Why do we extract these specific features?
In QML, the **structure ("shape") of a circuit directly impacts performance**.
- **gate_entropy**
Measures distribution of gates.
High entropy β†’ complex, less repetitive circuits β†’ harder for classical models to learn.
- **meyer_wallach**
Quantifies **global entanglement**.
Entanglement provides quantum advantage but increases sensitivity to noise.
- **adjacency**
Represents qubit interaction graph density.
High adjacency β†’ faster information spread, but higher risk of cross-talk errors.
- **cx_count (Two-Qubit Gates)**
The most critical complexity metric.
On NISQ devices, CNOT gates are **10x–100x noisier** than single-qubit gates.
**Note on Feature Correlation:** While structural metrics (like `gate_entropy` or `depth`) describe the complexity of the circuit, they do not encode the specific rotation angles of individual gates.
Therefore, predicting the exact expectation value using only structural features is an **extremely challenging task** (Non-Trivial Mapping).
---
## 🎯 3. Multi-Target Regression (The Bloch Vector)
Unlike traditional benchmarks that focus on a single observable, QSBench targets the **full global Bloch vector**:
[⟨X⟩global, ⟨Y⟩global, ⟨Z⟩global]
```text
| +Z (0)
|
-----|---- +Y
/|
/ | -Z (1)
+X
```
---
### Why predict all three?
A quantum state is a point on (or inside) the **Bloch sphere**.
- Predicting only Z gives an incomplete picture
- Multi-target regression learns correlations between:
- circuit structure
- full quantum state orientation
- behavior in Hilbert space
---
## πŸ€– 4. Using the ML Analytics Module
The Hub uses a **Random Forest Regressor** to establish a baseline of predictability.
### Workflow
1. **Select Dataset**
Choose a noise model and observe how it affects predictability.
2. **Select Features**
Recommended starting set:
- `gate_entropy`
- `meyer_wallach`
- `depth`
- `cx_count`
3. **Execute Baseline**
Performs an **80/20 train-test split**.
4. **Analyze the Triple Parity Plot**
- πŸ”΄ **Diagonal Red Line** β†’ perfect prediction
- πŸ“ˆ **Clustering near line** β†’ strong predictive signal
- πŸ” **Basis comparison**:
- Z often easier to predict
- X/Y depend more on circuit structure
- reveals architectural biases (HEA vs QFT, etc.)
πŸ“‰ **How to Interpret "Bad" Metrics?**
If you see a **negative** R2 or clustering around zero, don't panic. This is the expected behavior for standard regression on quantum data:
- **Mean Predictor Baseline:** In complex circuits (n=8, depth=6), expectation values naturally concentrate around 0. A model that simply predicts "0" for everything will have a low MAE but a zero/negative R2.
- **The Complexity Gap:** A negative R2 proves that the relationship between circuit shape and quantum output is highly non-linear.
- **Research Challenge:** Use these baseline results to justify the need for more advanced architectures like **Graph Neural Networks (GNNs)** or **Recursive Quantum Filters** that can process the gate sequence itself.
---
## πŸ”— 5. Project Resources
- πŸ€— Hugging Face Datasets β€” download dataset shards
- πŸ’» GitHub Repository β€” QSBench generator source code
- 🌐 Official Website β€” documentation and benchmarking leaderboards
---
*QSBench β€” Synthetic Quantum Dataset Benchmarks*