QSBench's picture
Update GUIDE.md
3af83b8 verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

🌌 Circuit Complexity Clustering Guide

Welcome to the Circuit Complexity Clustering Hub.
This tool demonstrates how unsupervised learning can automatically group quantum circuits by their structural complexity — without any labels or prior knowledge.


⚠️ Important: Local Dataset Notice

This application processes local .parquet files stored in the data/ directory.

  • Data Source: Local shards of QSBench (Core, Amplitude Damping, Depolarizing, etc.).
  • Processing: To ensure high performance, analysis is performed on a representative sample of 15,000 circuits, even if the source file contains hundreds of thousands of rows.
  • Goal: Showcase how circuit topology and gate structure naturally form complexity groups.

🎯 1. What is Being Done?

The model performs unsupervised clustering (K-Means) to group quantum circuits into clusters of similar structural complexity.

No labels are used

The algorithm discovers groups purely from:

  • Topology: How qubits are connected (derived from the adjacency matrix).
  • Gate Density: Counts of single and multi-qubit operations.
  • QASM Signals: Complexity metrics extracted directly from the OpenQASM code.

Each cluster represents circuits of similar “computational weight” or entanglement potential.


🧩 2. How the Model “Sees” a Circuit

The model does not use noise profiles or simulation results. It focuses on structural proxies:

🔹 Topology Features

  • adj_density: How densely the qubits interact.
  • adj_degree_avg: The average number of connections per qubit.

🔹 Gate Structure & Complexity

  • depth, total_gates, cx_count: Standard measures of circuit size.
  • gate_entropy: A measure of how "random" or "structured" the gate sequence is.

🔹 QASM-derived Signals

  • qasm_len: Character length of the code.
  • qasm_gates: Keyword-based gate count.

🤖 3. Model Overview: PCA & K-Means

The system follows a standard machine learning pipeline:

  1. Imputation & Scaling: Missing values are filled with medians, and features are normalized.
  2. K-Means: Groups circuits into $K$ clusters (2–10).
  3. PCA (Principal Component Analysis): Reduces high-dimensional data to 2D for visualization.

Understanding the PCA Map:

  • Horizontal Axis (Component 1): Usually represents the Scale. Points further to the right typically have more gates and higher qubit counts.
  • Vertical Axis (Component 2): Often reflects Density/Complexity. Points higher or lower on this axis differ in their connectivity patterns or gate-to-depth ratio.

🖼️ 4. Example Case: Large-Scale Dataset

When working with a full dataset (e.g., 150,000 rows from depolarizing noise), the clustering reveals highly distinct structural "clouds":

  • Core Clusters: Large, dense groups representing standard circuit templates.
  • The "Tail": Elongated structures showing a gradient of increasing depth.
  • Outliers: Isolated points (far left or far top) representing unique, non-standard topologies.

изображение


📊 5. Understanding the Results

A. PCA Projection

  • Each point = One quantum circuit.
  • Color = Assigned cluster.
  • Proximity = Similarity. Circuits close to each other share similar structural DNA.

B. Silhouette Score

  • A metric from 0 to 1 measuring how well-separated the clusters are.
  • High score: Distinct, well-defined complexity levels.

C. Cluster Sizes Table

  • Shows the distribution of circuits. A heavily imbalanced table might suggest that most of your dataset shares a very similar base structure.

🧪 6. Experimentation Tips

  • Search for Outliers: Look for isolated points far from the main "clouds". These are unique circuits — perfect candidates for edge-case benchmarking.
  • Tune K: If clusters look fragmented on a large dataset, try $K=3$ or $K=5$ to see broader complexity tiers.
  • Compare Datasets: Notice how the "shape" of the complexity map changes between Core (clean) and Transpilation datasets.

🛠️ 7. Troubleshooting

"Too few rows for clustering" error?

  1. NaN values: You may have selected a feature that is empty (all NaNs) in that specific dataset. Try depth or total_gates.
  2. Path Error: Ensure your .parquet files are in data/{folder_name}/.

🔬 8. Key Insight

Quantum circuits naturally form groups of similar complexity even without any supervision. Features like connectivity, depth, and two-qubit gate count are enough for an algorithm to discover meaningful “complexity levels”.


🔗 9. Project Resources