| # 🌌 Circuit Complexity Clustering Guide |
|
|
| Welcome to the **Circuit Complexity Clustering Hub**. |
| This tool demonstrates how **unsupervised learning** can automatically group quantum circuits by their structural complexity — without any labels or prior knowledge. |
|
|
| --- |
|
|
| ## ⚠️ Important: Local Dataset Notice |
|
|
| This application processes local `.parquet` files stored in the `data/` directory. |
|
|
| - **Data Source**: Local shards of QSBench (Core, Amplitude Damping, Depolarizing, etc.). |
| - **Processing**: To ensure high performance, analysis is performed on a representative sample of **15,000 circuits**, even if the source file contains hundreds of thousands of rows. |
| - **Goal**: Showcase how circuit topology and gate structure naturally form complexity groups. |
|
|
| --- |
|
|
| ## 🎯 1. What is Being Done? |
|
|
| The model performs **unsupervised clustering** (K-Means) to group quantum circuits into clusters of similar **structural complexity**. |
|
|
| ### No labels are used |
| The algorithm discovers groups purely from: |
| - **Topology**: How qubits are connected (derived from the adjacency matrix). |
| - **Gate Density**: Counts of single and multi-qubit operations. |
| - **QASM Signals**: Complexity metrics extracted directly from the OpenQASM code. |
|
|
| Each cluster represents circuits of similar “computational weight” or entanglement potential. |
|
|
| --- |
|
|
| ## 🧩 2. How the Model “Sees” a Circuit |
|
|
| The model does **not** use noise profiles or simulation results. It focuses on **structural proxies**: |
|
|
| ### 🔹 Topology Features |
| - `adj_density`: How densely the qubits interact. |
| - `adj_degree_avg`: The average number of connections per qubit. |
|
|
| ### 🔹 Gate Structure & Complexity |
| - `depth`, `total_gates`, `cx_count`: Standard measures of circuit size. |
| - `gate_entropy`: A measure of how "random" or "structured" the gate sequence is. |
|
|
| ### 🔹 QASM-derived Signals |
| - `qasm_len`: Character length of the code. |
| - `qasm_gates`: Keyword-based gate count. |
|
|
| --- |
|
|
| ## 🤖 3. Model Overview: PCA & K-Means |
|
|
| The system follows a standard machine learning pipeline: |
| 1. **Imputation & Scaling**: Missing values are filled with medians, and features are normalized. |
| 2. **K-Means**: Groups circuits into $K$ clusters (2–10). |
| 3. **PCA (Principal Component Analysis)**: Reduces high-dimensional data to 2D for visualization. |
|
|
| ### Understanding the PCA Map: |
| - **Horizontal Axis (Component 1):** Usually represents the **Scale**. Points further to the right typically have more gates and higher qubit counts. |
| - **Vertical Axis (Component 2):** Often reflects **Density/Complexity**. Points higher or lower on this axis differ in their connectivity patterns or gate-to-depth ratio. |
|
|
| --- |
|
|
| ## 🖼️ 4. Example Case: Large-Scale Dataset |
|
|
| When working with a full dataset (e.g., **150,000 rows** from `depolarizing` noise), the clustering reveals highly distinct structural "clouds": |
|
|
| - **Core Clusters**: Large, dense groups representing standard circuit templates. |
| - **The "Tail":** Elongated structures showing a gradient of increasing depth. |
| - **Outliers:** Isolated points (far left or far top) representing unique, non-standard topologies. |
|
|
|
|
|  |
|
|
| --- |
|
|
| ## 📊 5. Understanding the Results |
|
|
| ### A. PCA Projection |
| - **Each point** = One quantum circuit. |
| - **Color** = Assigned cluster. |
| - **Proximity** = Similarity. Circuits close to each other share similar structural DNA. |
|
|
| ### B. Silhouette Score |
| - A metric from **0 to 1** measuring how well-separated the clusters are. |
| - **High score:** Distinct, well-defined complexity levels. |
|
|
| ### C. Cluster Sizes Table |
| - Shows the distribution of circuits. A heavily imbalanced table might suggest that most of your dataset shares a very similar base structure. |
|
|
| --- |
|
|
| ## 🧪 6. Experimentation Tips |
|
|
| - **Search for Outliers:** Look for isolated points far from the main "clouds". These are unique circuits — perfect candidates for edge-case benchmarking. |
| - **Tune K:** If clusters look fragmented on a large dataset, try $K=3$ or $K=5$ to see broader complexity tiers. |
| - **Compare Datasets:** Notice how the "shape" of the complexity map changes between `Core` (clean) and `Transpilation` datasets. |
|
|
| --- |
|
|
| ## 🛠️ 7. Troubleshooting |
|
|
| **"Too few rows for clustering" error?** |
| 1. **NaN values:** You may have selected a feature that is empty (all NaNs) in that specific dataset. Try `depth` or `total_gates`. |
| 2. **Path Error:** Ensure your `.parquet` files are in `data/{folder_name}/`. |
|
|
| --- |
|
|
| ## 🔬 8. Key Insight |
|
|
| > Quantum circuits naturally form groups of similar complexity even without any supervision. Features like connectivity, depth, and two-qubit gate count are enough for an algorithm to discover meaningful “complexity levels”. |
|
|
| --- |
|
|
| ## 🔗 9. Project Resources |
|
|
| - 🤗 **Hugging Face**: [https://huggingface.co/QSBench](https://huggingface.co/QSBench) |
| - 💻 **GitHub**: [https://github.com/QSBench](https://github.com/QSBench) |
| - 🌐 **Website**: [https://qsbench.github.io](https://qsbench.github.io) |