File size: 5,074 Bytes
ce7922a
 
 
 
 
 
 
3af83b8
ce7922a
3af83b8
ce7922a
3af83b8
 
ce7922a
 
 
 
 
 
 
 
 
 
3af83b8
 
 
ce7922a
3af83b8
ce7922a
 
 
 
 
3af83b8
ce7922a
 
3af83b8
 
ce7922a
 
3af83b8
 
ce7922a
 
3af83b8
 
ce7922a
 
 
3af83b8
ce7922a
3af83b8
 
 
 
ce7922a
3af83b8
 
 
ce7922a
 
 
3af83b8
ce7922a
3af83b8
ce7922a
3af83b8
 
 
ce7922a
 
3af83b8
ce7922a
 
 
3af83b8
 
 
 
 
 
 
 
 
 
ce7922a
3af83b8
 
ce7922a
 
 
 
 
3af83b8
 
 
ce7922a
 
 
3af83b8
 
 
 
 
 
 
ce7922a
3af83b8
ce7922a
3af83b8
ce7922a
 
 
3af83b8
ce7922a
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# 🌌 Circuit Complexity Clustering Guide

Welcome to the **Circuit Complexity Clustering Hub**.  
This tool demonstrates how **unsupervised learning** can automatically group quantum circuits by their structural complexity — without any labels or prior knowledge.

---

## ⚠️ Important: Local Dataset Notice

This application processes local `.parquet` files stored in the `data/` directory.

- **Data Source**: Local shards of QSBench (Core, Amplitude Damping, Depolarizing, etc.).
- **Processing**: To ensure high performance, analysis is performed on a representative sample of **15,000 circuits**, even if the source file contains hundreds of thousands of rows.
- **Goal**: Showcase how circuit topology and gate structure naturally form complexity groups.

---

## 🎯 1. What is Being Done?

The model performs **unsupervised clustering** (K-Means) to group quantum circuits into clusters of similar **structural complexity**.

### No labels are used
The algorithm discovers groups purely from:
- **Topology**: How qubits are connected (derived from the adjacency matrix).
- **Gate Density**: Counts of single and multi-qubit operations.
- **QASM Signals**: Complexity metrics extracted directly from the OpenQASM code.

Each cluster represents circuits of similar “computational weight” or entanglement potential.

---

## 🧩 2. How the Model “Sees” a Circuit

The model does **not** use noise profiles or simulation results. It focuses on **structural proxies**:

### 🔹 Topology Features
- `adj_density`: How densely the qubits interact.
- `adj_degree_avg`: The average number of connections per qubit.

### 🔹 Gate Structure & Complexity
- `depth`, `total_gates`, `cx_count`: Standard measures of circuit size.
- `gate_entropy`: A measure of how "random" or "structured" the gate sequence is.

### 🔹 QASM-derived Signals
- `qasm_len`: Character length of the code.
- `qasm_gates`: Keyword-based gate count.

---

## 🤖 3. Model Overview: PCA & K-Means

The system follows a standard machine learning pipeline:
1. **Imputation & Scaling**: Missing values are filled with medians, and features are normalized.
2. **K-Means**: Groups circuits into $K$ clusters (2–10).
3. **PCA (Principal Component Analysis)**: Reduces high-dimensional data to 2D for visualization.

### Understanding the PCA Map:
- **Horizontal Axis (Component 1):** Usually represents the **Scale**. Points further to the right typically have more gates and higher qubit counts.
- **Vertical Axis (Component 2):** Often reflects **Density/Complexity**. Points higher or lower on this axis differ in their connectivity patterns or gate-to-depth ratio.

---

## 🖼️ 4. Example Case: Large-Scale Dataset

When working with a full dataset (e.g., **150,000 rows** from `depolarizing` noise), the clustering reveals highly distinct structural "clouds":

- **Core Clusters**: Large, dense groups representing standard circuit templates.
- **The "Tail":** Elongated structures showing a gradient of increasing depth.
- **Outliers:** Isolated points (far left or far top) representing unique, non-standard topologies.


![изображение](https://cdn-uploads.huggingface.co/production/uploads/69cab322f9896e16f84eb345/bmEd1lsR_jaT99ZklPCSQ.png)

---

## 📊 5. Understanding the Results

### A. PCA Projection
- **Each point** = One quantum circuit.
- **Color** = Assigned cluster.
- **Proximity** = Similarity. Circuits close to each other share similar structural DNA.

### B. Silhouette Score
- A metric from **0 to 1** measuring how well-separated the clusters are.
- **High score:** Distinct, well-defined complexity levels.

### C. Cluster Sizes Table
- Shows the distribution of circuits. A heavily imbalanced table might suggest that most of your dataset shares a very similar base structure.

---

## 🧪 6. Experimentation Tips

- **Search for Outliers:** Look for isolated points far from the main "clouds". These are unique circuits — perfect candidates for edge-case benchmarking.
- **Tune K:** If clusters look fragmented on a large dataset, try $K=3$ or $K=5$ to see broader complexity tiers.
- **Compare Datasets:** Notice how the "shape" of the complexity map changes between `Core` (clean) and `Transpilation` datasets.

---

## 🛠️ 7. Troubleshooting

**"Too few rows for clustering" error?**
1. **NaN values:** You may have selected a feature that is empty (all NaNs) in that specific dataset. Try `depth` or `total_gates`.
2. **Path Error:** Ensure your `.parquet` files are in `data/{folder_name}/`.

---

## 🔬 8. Key Insight

> Quantum circuits naturally form groups of similar complexity even without any supervision. Features like connectivity, depth, and two-qubit gate count are enough for an algorithm to discover meaningful “complexity levels”.

---

## 🔗 9. Project Resources

- 🤗 **Hugging Face**: [https://huggingface.co/QSBench](https://huggingface.co/QSBench)
- 💻 **GitHub**: [https://github.com/QSBench](https://github.com/QSBench)
- 🌐 **Website**: [https://qsbench.github.io](https://qsbench.github.io)