# 🌌 Circuit Complexity Clustering Guide Welcome to the **Circuit Complexity Clustering Hub**. This tool demonstrates how **unsupervised learning** can automatically group quantum circuits by their structural complexity — without any labels or prior knowledge. --- ## ⚠️ Important: Local Dataset Notice This application processes local `.parquet` files stored in the `data/` directory. - **Data Source**: Local shards of QSBench (Core, Amplitude Damping, Depolarizing, etc.). - **Processing**: To ensure high performance, analysis is performed on a representative sample of **15,000 circuits**, even if the source file contains hundreds of thousands of rows. - **Goal**: Showcase how circuit topology and gate structure naturally form complexity groups. --- ## 🎯 1. What is Being Done? The model performs **unsupervised clustering** (K-Means) to group quantum circuits into clusters of similar **structural complexity**. ### No labels are used The algorithm discovers groups purely from: - **Topology**: How qubits are connected (derived from the adjacency matrix). - **Gate Density**: Counts of single and multi-qubit operations. - **QASM Signals**: Complexity metrics extracted directly from the OpenQASM code. Each cluster represents circuits of similar “computational weight” or entanglement potential. --- ## 🧩 2. How the Model “Sees” a Circuit The model does **not** use noise profiles or simulation results. It focuses on **structural proxies**: ### 🔹 Topology Features - `adj_density`: How densely the qubits interact. - `adj_degree_avg`: The average number of connections per qubit. ### 🔹 Gate Structure & Complexity - `depth`, `total_gates`, `cx_count`: Standard measures of circuit size. - `gate_entropy`: A measure of how "random" or "structured" the gate sequence is. ### 🔹 QASM-derived Signals - `qasm_len`: Character length of the code. - `qasm_gates`: Keyword-based gate count. --- ## 🤖 3. Model Overview: PCA & K-Means The system follows a standard machine learning pipeline: 1. **Imputation & Scaling**: Missing values are filled with medians, and features are normalized. 2. **K-Means**: Groups circuits into $K$ clusters (2–10). 3. **PCA (Principal Component Analysis)**: Reduces high-dimensional data to 2D for visualization. ### Understanding the PCA Map: - **Horizontal Axis (Component 1):** Usually represents the **Scale**. Points further to the right typically have more gates and higher qubit counts. - **Vertical Axis (Component 2):** Often reflects **Density/Complexity**. Points higher or lower on this axis differ in their connectivity patterns or gate-to-depth ratio. --- ## 🖼️ 4. Example Case: Large-Scale Dataset When working with a full dataset (e.g., **150,000 rows** from `depolarizing` noise), the clustering reveals highly distinct structural "clouds": - **Core Clusters**: Large, dense groups representing standard circuit templates. - **The "Tail":** Elongated structures showing a gradient of increasing depth. - **Outliers:** Isolated points (far left or far top) representing unique, non-standard topologies. ![изображение](https://cdn-uploads.huggingface.co/production/uploads/69cab322f9896e16f84eb345/bmEd1lsR_jaT99ZklPCSQ.png) --- ## 📊 5. Understanding the Results ### A. PCA Projection - **Each point** = One quantum circuit. - **Color** = Assigned cluster. - **Proximity** = Similarity. Circuits close to each other share similar structural DNA. ### B. Silhouette Score - A metric from **0 to 1** measuring how well-separated the clusters are. - **High score:** Distinct, well-defined complexity levels. ### C. Cluster Sizes Table - Shows the distribution of circuits. A heavily imbalanced table might suggest that most of your dataset shares a very similar base structure. --- ## 🧪 6. Experimentation Tips - **Search for Outliers:** Look for isolated points far from the main "clouds". These are unique circuits — perfect candidates for edge-case benchmarking. - **Tune K:** If clusters look fragmented on a large dataset, try $K=3$ or $K=5$ to see broader complexity tiers. - **Compare Datasets:** Notice how the "shape" of the complexity map changes between `Core` (clean) and `Transpilation` datasets. --- ## 🛠️ 7. Troubleshooting **"Too few rows for clustering" error?** 1. **NaN values:** You may have selected a feature that is empty (all NaNs) in that specific dataset. Try `depth` or `total_gates`. 2. **Path Error:** Ensure your `.parquet` files are in `data/{folder_name}/`. --- ## 🔬 8. Key Insight > Quantum circuits naturally form groups of similar complexity even without any supervision. Features like connectivity, depth, and two-qubit gate count are enough for an algorithm to discover meaningful “complexity levels”. --- ## 🔗 9. Project Resources - 🤗 **Hugging Face**: [https://huggingface.co/QSBench](https://huggingface.co/QSBench) - 💻 **GitHub**: [https://github.com/QSBench](https://github.com/QSBench) - 🌐 **Website**: [https://qsbench.github.io](https://qsbench.github.io)