Spaces:

QSBench
/

Circuit-Complexity-Clustering

Running

App Files Files Community

Circuit-Complexity-Clustering / GUIDE.md

QSBench

Update GUIDE.md

3af83b8 verified 1 day ago

preview code

raw

history blame contribute delete

5.07 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

🌌 Circuit Complexity Clustering Guide

Welcome to the Circuit Complexity Clustering Hub.
This tool demonstrates how unsupervised learning can automatically group quantum circuits by their structural complexity — without any labels or prior knowledge.

⚠️ Important: Local Dataset Notice

This application processes local .parquet files stored in the data/ directory.

Data Source: Local shards of QSBench (Core, Amplitude Damping, Depolarizing, etc.).
Processing: To ensure high performance, analysis is performed on a representative sample of 15,000 circuits, even if the source file contains hundreds of thousands of rows.
Goal: Showcase how circuit topology and gate structure naturally form complexity groups.

🎯 1. What is Being Done?

The model performs unsupervised clustering (K-Means) to group quantum circuits into clusters of similar structural complexity.

No labels are used

The algorithm discovers groups purely from:

Topology: How qubits are connected (derived from the adjacency matrix).
Gate Density: Counts of single and multi-qubit operations.
QASM Signals: Complexity metrics extracted directly from the OpenQASM code.

Each cluster represents circuits of similar “computational weight” or entanglement potential.

🧩 2. How the Model “Sees” a Circuit

The model does not use noise profiles or simulation results. It focuses on structural proxies:

🔹 Topology Features

adj_density: How densely the qubits interact.
adj_degree_avg: The average number of connections per qubit.

🔹 Gate Structure & Complexity

depth, total_gates, cx_count: Standard measures of circuit size.
gate_entropy: A measure of how "random" or "structured" the gate sequence is.

🔹 QASM-derived Signals

qasm_len: Character length of the code.
qasm_gates: Keyword-based gate count.

🤖 3. Model Overview: PCA & K-Means

The system follows a standard machine learning pipeline:

Imputation & Scaling: Missing values are filled with medians, and features are normalized.
K-Means: Groups circuits into $K$ clusters (2–10).
PCA (Principal Component Analysis): Reduces high-dimensional data to 2D for visualization.

Understanding the PCA Map:

Horizontal Axis (Component 1): Usually represents the Scale. Points further to the right typically have more gates and higher qubit counts.
Vertical Axis (Component 2): Often reflects Density/Complexity. Points higher or lower on this axis differ in their connectivity patterns or gate-to-depth ratio.

🖼️ 4. Example Case: Large-Scale Dataset

When working with a full dataset (e.g., 150,000 rows from depolarizing noise), the clustering reveals highly distinct structural "clouds":

Core Clusters: Large, dense groups representing standard circuit templates.
The "Tail": Elongated structures showing a gradient of increasing depth.
Outliers: Isolated points (far left or far top) representing unique, non-standard topologies.

📊 5. Understanding the Results

A. PCA Projection

Each point = One quantum circuit.
Color = Assigned cluster.
Proximity = Similarity. Circuits close to each other share similar structural DNA.

B. Silhouette Score

A metric from 0 to 1 measuring how well-separated the clusters are.
High score: Distinct, well-defined complexity levels.

C. Cluster Sizes Table

Shows the distribution of circuits. A heavily imbalanced table might suggest that most of your dataset shares a very similar base structure.

🧪 6. Experimentation Tips

Search for Outliers: Look for isolated points far from the main "clouds". These are unique circuits — perfect candidates for edge-case benchmarking.
Tune K: If clusters look fragmented on a large dataset, try $K=3$ or $K=5$ to see broader complexity tiers.
Compare Datasets: Notice how the "shape" of the complexity map changes between Core (clean) and Transpilation datasets.

🛠️ 7. Troubleshooting

"Too few rows for clustering" error?

NaN values: You may have selected a feature that is empty (all NaNs) in that specific dataset. Try depth or total_gates.
Path Error: Ensure your .parquet files are in data/{folder_name}/.

🔬 8. Key Insight

Quantum circuits naturally form groups of similar complexity even without any supervision. Features like connectivity, depth, and two-qubit gate count are enough for an algorithm to discover meaningful “complexity levels”.

🔗 9. Project Resources

🤗 Hugging Face: https://huggingface.co/QSBench
💻 GitHub: https://github.com/QSBench
🌐 Website: https://qsbench.github.io