A newer version of the Gradio SDK is available: 6.11.0
🌌 Circuit Complexity Clustering Guide
Welcome to the Circuit Complexity Clustering Hub.
This tool demonstrates how unsupervised learning can automatically group quantum circuits by their structural complexity — without any labels or prior knowledge.
⚠️ Important: Local Dataset Notice
This application processes local .parquet files stored in the data/ directory.
- Data Source: Local shards of QSBench (Core, Amplitude Damping, Depolarizing, etc.).
- Processing: To ensure high performance, analysis is performed on a representative sample of 15,000 circuits, even if the source file contains hundreds of thousands of rows.
- Goal: Showcase how circuit topology and gate structure naturally form complexity groups.
🎯 1. What is Being Done?
The model performs unsupervised clustering (K-Means) to group quantum circuits into clusters of similar structural complexity.
No labels are used
The algorithm discovers groups purely from:
- Topology: How qubits are connected (derived from the adjacency matrix).
- Gate Density: Counts of single and multi-qubit operations.
- QASM Signals: Complexity metrics extracted directly from the OpenQASM code.
Each cluster represents circuits of similar “computational weight” or entanglement potential.
🧩 2. How the Model “Sees” a Circuit
The model does not use noise profiles or simulation results. It focuses on structural proxies:
🔹 Topology Features
adj_density: How densely the qubits interact.adj_degree_avg: The average number of connections per qubit.
🔹 Gate Structure & Complexity
depth,total_gates,cx_count: Standard measures of circuit size.gate_entropy: A measure of how "random" or "structured" the gate sequence is.
🔹 QASM-derived Signals
qasm_len: Character length of the code.qasm_gates: Keyword-based gate count.
🤖 3. Model Overview: PCA & K-Means
The system follows a standard machine learning pipeline:
- Imputation & Scaling: Missing values are filled with medians, and features are normalized.
- K-Means: Groups circuits into $K$ clusters (2–10).
- PCA (Principal Component Analysis): Reduces high-dimensional data to 2D for visualization.
Understanding the PCA Map:
- Horizontal Axis (Component 1): Usually represents the Scale. Points further to the right typically have more gates and higher qubit counts.
- Vertical Axis (Component 2): Often reflects Density/Complexity. Points higher or lower on this axis differ in their connectivity patterns or gate-to-depth ratio.
🖼️ 4. Example Case: Large-Scale Dataset
When working with a full dataset (e.g., 150,000 rows from depolarizing noise), the clustering reveals highly distinct structural "clouds":
- Core Clusters: Large, dense groups representing standard circuit templates.
- The "Tail": Elongated structures showing a gradient of increasing depth.
- Outliers: Isolated points (far left or far top) representing unique, non-standard topologies.
📊 5. Understanding the Results
A. PCA Projection
- Each point = One quantum circuit.
- Color = Assigned cluster.
- Proximity = Similarity. Circuits close to each other share similar structural DNA.
B. Silhouette Score
- A metric from 0 to 1 measuring how well-separated the clusters are.
- High score: Distinct, well-defined complexity levels.
C. Cluster Sizes Table
- Shows the distribution of circuits. A heavily imbalanced table might suggest that most of your dataset shares a very similar base structure.
🧪 6. Experimentation Tips
- Search for Outliers: Look for isolated points far from the main "clouds". These are unique circuits — perfect candidates for edge-case benchmarking.
- Tune K: If clusters look fragmented on a large dataset, try $K=3$ or $K=5$ to see broader complexity tiers.
- Compare Datasets: Notice how the "shape" of the complexity map changes between
Core(clean) andTranspilationdatasets.
🛠️ 7. Troubleshooting
"Too few rows for clustering" error?
- NaN values: You may have selected a feature that is empty (all NaNs) in that specific dataset. Try
depthortotal_gates. - Path Error: Ensure your
.parquetfiles are indata/{folder_name}/.
🔬 8. Key Insight
Quantum circuits naturally form groups of similar complexity even without any supervision. Features like connectivity, depth, and two-qubit gate count are enough for an algorithm to discover meaningful “complexity levels”.
🔗 9. Project Resources
- 🤗 Hugging Face: https://huggingface.co/QSBench
- 💻 GitHub: https://github.com/QSBench
- 🌐 Website: https://qsbench.github.io
