Spaces:

QSBench
/

Circuit-Complexity-Clustering

Running

App Files Files Community

Circuit-Complexity-Clustering / GUIDE.md

QSBench

Update GUIDE.md

3af83b8 verified 1 day ago

preview code

raw

history blame contribute delete

5.07 kB

	# 🌌 Circuit Complexity Clustering Guide

	Welcome to the Circuit Complexity Clustering Hub.
	This tool demonstrates how unsupervised learning can automatically group quantum circuits by their structural complexity — without any labels or prior knowledge.

	---

	## ⚠️ Important: Local Dataset Notice

	This application processes local `.parquet` files stored in the `data/` directory.

	- Data Source: Local shards of QSBench (Core, Amplitude Damping, Depolarizing, etc.).
	- Processing: To ensure high performance, analysis is performed on a representative sample of 15,000 circuits, even if the source file contains hundreds of thousands of rows.
	- Goal: Showcase how circuit topology and gate structure naturally form complexity groups.

	---

	## 🎯 1. What is Being Done?

	The model performs unsupervised clustering (K-Means) to group quantum circuits into clusters of similar structural complexity.

	### No labels are used
	The algorithm discovers groups purely from:
	- Topology: How qubits are connected (derived from the adjacency matrix).
	- Gate Density: Counts of single and multi-qubit operations.
	- QASM Signals: Complexity metrics extracted directly from the OpenQASM code.

	Each cluster represents circuits of similar “computational weight” or entanglement potential.

	---

	## 🧩 2. How the Model “Sees” a Circuit

	The model does not use noise profiles or simulation results. It focuses on structural proxies:

	### 🔹 Topology Features
	- `adj_density`: How densely the qubits interact.
	- `adj_degree_avg`: The average number of connections per qubit.

	### 🔹 Gate Structure & Complexity
	- `depth`, `total_gates`, `cx_count`: Standard measures of circuit size.
	- `gate_entropy`: A measure of how "random" or "structured" the gate sequence is.

	### 🔹 QASM-derived Signals
	- `qasm_len`: Character length of the code.
	- `qasm_gates`: Keyword-based gate count.

	---

	## 🤖 3. Model Overview: PCA & K-Means

	The system follows a standard machine learning pipeline:
	1. Imputation & Scaling: Missing values are filled with medians, and features are normalized.
	2. K-Means: Groups circuits into $K$ clusters (2–10).
	3. PCA (Principal Component Analysis): Reduces high-dimensional data to 2D for visualization.

	### Understanding the PCA Map:
	- Horizontal Axis (Component 1): Usually represents the Scale. Points further to the right typically have more gates and higher qubit counts.
	- Vertical Axis (Component 2): Often reflects Density/Complexity. Points higher or lower on this axis differ in their connectivity patterns or gate-to-depth ratio.

	---

	## 🖼️ 4. Example Case: Large-Scale Dataset

	When working with a full dataset (e.g., 150,000 rows from `depolarizing` noise), the clustering reveals highly distinct structural "clouds":

	- Core Clusters: Large, dense groups representing standard circuit templates.
	- The "Tail": Elongated structures showing a gradient of increasing depth.
	- Outliers: Isolated points (far left or far top) representing unique, non-standard topologies.


	![изображение](https://cdn-uploads.huggingface.co/production/uploads/69cab322f9896e16f84eb345/bmEd1lsR_jaT99ZklPCSQ.png)

	---

	## 📊 5. Understanding the Results

	### A. PCA Projection
	- Each point = One quantum circuit.
	- Color = Assigned cluster.
	- Proximity = Similarity. Circuits close to each other share similar structural DNA.

	### B. Silhouette Score
	- A metric from 0 to 1 measuring how well-separated the clusters are.
	- High score: Distinct, well-defined complexity levels.

	### C. Cluster Sizes Table
	- Shows the distribution of circuits. A heavily imbalanced table might suggest that most of your dataset shares a very similar base structure.

	---

	## 🧪 6. Experimentation Tips

	- Search for Outliers: Look for isolated points far from the main "clouds". These are unique circuits — perfect candidates for edge-case benchmarking.
	- Tune K: If clusters look fragmented on a large dataset, try $K=3$ or $K=5$ to see broader complexity tiers.
	- Compare Datasets: Notice how the "shape" of the complexity map changes between `Core` (clean) and `Transpilation` datasets.

	---

	## 🛠️ 7. Troubleshooting

	"Too few rows for clustering" error?
	1. NaN values: You may have selected a feature that is empty (all NaNs) in that specific dataset. Try `depth` or `total_gates`.
	2. Path Error: Ensure your `.parquet` files are in `data/{folder_name}/`.

	---

	## 🔬 8. Key Insight

	> Quantum circuits naturally form groups of similar complexity even without any supervision. Features like connectivity, depth, and two-qubit gate count are enough for an algorithm to discover meaningful “complexity levels”.

	---

	## 🔗 9. Project Resources

	- 🤗 Hugging Face: [https://huggingface.co/QSBench](https://huggingface.co/QSBench)
	- 💻 GitHub: [https://github.com/QSBench](https://github.com/QSBench)
	- 🌐 Website: [https://qsbench.github.io](https://qsbench.github.io)