DaMsTaR
/

QuantumML

Model card Files Files and versions

xet

Community

DaMsTaR commited on about 11 hours ago

Commit

101d36b

verified ·

1 Parent(s): 8f20285

Update README.md

Browse files

Files changed (1) hide show

README.md +403 -1

README.md CHANGED Viewed

	@@ -1 +1,403 @@
1	- # Quantum-ML

+# Quantum ML
+## Overview
+This guide provides a complete, actionable roadmap to replicate the results from Kung et al. (arXiv:2503.12099), which presents a machine learning approach for automatic characterization of fluxonium superconducting qubit parameters (E_J, E_C, E_L) using a Swin Transformer V2 model trained via deep transfer learning. The paper reports ~95.6% average accuracy across all three energy parameters. The authors mention that source code will be available on GitHub after publication, but since it may not yet be released, this guide reconstructs every detail needed for a from-scratch replication.
+***
+## Phase 1: Environment Setup
+### Hardware Requirements
+- A GPU with at least 8 GB VRAM (NVIDIA RTX 3060 or better recommended). Swin Transformer V2 Tiny has ~28M parameters and is relatively lightweight.[^1]
+- Sufficient CPU/RAM for generating 15,000+ spectrum simulations via QuTiP.
+### Software Dependencies
+Install the following Python packages:
+- **PyTorch** (≥1.12) with CUDA support
+- **torchvision** — provides pre-built Swin Transformer V2 models (`swin_v2_t`, `swin_v2_b`)[^2][^1]
+- **timm** (PyTorch Image Models) — alternative source for `swinv2_tiny_window8_256` and other variants[^3]
+- **QuTiP** — Quantum Toolbox in Python for Hamiltonian diagonalization and spectrum computation[^4]
+- **scqubits** — optional but helpful for fluxonium simulation and validation[^5][^6]
+- **prodigyopt** — the Prodigy optimizer (`pip install prodigyopt`)[^7][^8]
+- **scipy** — for `find_peaks_cwt` peak detection[^9]
+- **numpy, matplotlib, PIL/Pillow**
+```
+pip install torch torchvision timm qutip scqubits prodigyopt scipy numpy matplotlib pillow
+```
+***
+## Phase 2: Understanding the Fluxonium Hamiltonian
+### The Model Hamiltonian
+The fluxonium qubit Hamiltonian is:
+H = 4 * E_C * n^2 - E_J * cos(phi + phi_ext) + 0.5 * E_L * phi^2
+where:
+- **E_C** = charging energy (capacitance)
+- **E_J** = Josephson energy
+- **E_L** = inductive energy
+- **phi** = phase operator across inductance
+- **n** = displacement charge operator
+- **phi_ext** = external magnetic flux (varied over one flux quantum period)
+### Parameter Ranges
+The training data spans these experimentally relevant ranges:
+| Parameter | Range (GHz) | Span |
+|-----------|-------------|------|
+| E_C | 0.5 – 3.0 | 2.5 GHz |
+| E_L | 0.1 – 2.0 | 1.9 GHz |
+| E_J | 2.0 – 10.0 | 8.0 GHz |
+### Transitions Considered
+The energy transitions used are: **0→1, 0→2, 0→3, 0→4, 0→5, 1→2, and 1→3**, all within the frequency window of **4.0–8.0 GHz**.
+***
+## Phase 3: Generating Training Data
+This is the most computationally intensive phase. There are two distinct datasets to generate.
+### Dataset 1: Pure Spectrum Dataset (N = 15,392)
+This dataset contains only the bare transition energies (no coupling/readout effects), making it fast to compute.
+**For each parameter combination (E_C, E_L, E_J):**
+1. **Sample parameters** randomly or on a grid within the ranges above. The paper uses 15,392 unique combinations.
+2. **Sweep phi_ext** with 256 points per flux period (0 to 2π).
+3. **Diagonalize the Hamiltonian** at each flux point using QuTiP. Use `scqubits.Fluxonium` or build the Hamiltonian matrix directly in QuTiP with a sufficiently large cutoff (typically 110 states).[^5]
+4. **Compute transition energies** between all relevant level pairs (0-1, 0-2, ..., 1-3).
+5. **Filter transitions** to retain only those within 4.0–8.0 GHz.
+6. **Render as an image**: Plot each valid transition point as a black dot on a 2D image (x-axis = phi_ext, y-axis = frequency in GHz). The image serves as input to the Swin Transformer.
+**Example code sketch for a single spectrum:**
+```python
+import scqubits as scq
+import numpy as np
+def generate_pure_spectrum(EC, EL, EJ, n_flux=256, cutoff=110):
+    fluxonium = scq.Fluxonium(EJ=EJ, EC=EC, EL=EL, flux=0.0, cutoff=cutoff)
+    flux_vals = np.linspace(0.0, 1.0, n_flux)  # in units of Phi_0
+    transitions = [(0,1), (0,2), (0,3), (0,4), (0,5), (1,2), (1,3)]
+    spectrum_points = []
+    for flux in flux_vals:
+        fluxonium.flux = flux
+        evals = fluxonium.eigenvals(evals_count=6)
+        for (i, j) in transitions:
+            if j < len(evals):
+                freq = evals[j] - evals[i]
+                if 4.0 <= freq <= 8.0:
+                    spectrum_points.append((flux, freq))
+    return spectrum_points
+```
+**Image generation**: Convert each spectrum into a fixed-resolution image (e.g., 256×256 pixels). The Swin Transformer V2 Tiny expects 256×256 input. Plot flux on the x-axis and frequency on the y-axis, with black dots on a white background. Save as PNG or convert directly to a tensor.[^1]
+### Dataset 2: Dispersive Readout Dataset (N = 469)
+This dataset simulates a more realistic measurement scenario including dispersive readout effects:
+1. **Readout resonator** at 6.00 GHz with linewidth 7 MHz and coupling strength g = 100 MHz.
+2. **Compute the dispersive shift** for each transition using second-order perturbation theory.
+3. **Calculate voltage change** in readout response caused by dispersive shift for a saturation drive at every transition and flux value.
+4. **Threshold**: Exclude data points where readout voltage change < 10% of maximum magnitude at readout resonance.
+5. **Render as image** similarly to the pure spectrum, but now transition points carry varying intensities based on signal magnitude.
+This computation is >100× slower per spectrum than the pure dataset, which is why only 469 samples are used. The dispersive readout dataset is critical for the transfer learning step.
+***
+## Phase 4: Model Architecture — Swin Transformer V2
+### Model Selection
+The paper uses **Swin Transformer V2**, chosen for its lightweight architecture compared to ResNet and DenseNet alternatives. The exact variant isn't specified, but the **Swin V2 Tiny** model is the most practical choice:[^10][^11]
+| Property | Swin V2 Tiny |
+|----------|-------------|
+| Parameters | ~28.3M[^1] |
+| Input resolution | 256 × 256 |
+| GFLOPs | 5.94[^1] |
+| Embed dim | 96 |
+| Depths | [^12][^12][^13][^12] |
+| Num heads | [^14][^13][^15][^7] |
+| Window size | 8 |
+### Loading the Model
+```python
+import torchvision.models as models
+import torch.nn as nn
+# Load pretrained Swin V2 Tiny (ImageNet weights)
+model = models.swin_v2_t(weights=models.Swin_V2_T_Weights.IMAGENET1K_V1)
+# Modify the classification head for regression (3 outputs: EC, EL, EJ)
+model.head = nn.Linear(model.head.in_features, 3)
+```
+Alternatively, using `timm`:
+```python
+import timm
+model = timm.create_model('swinv2_tiny_window8_256', pretrained=True, num_classes=3)
+```
+### Input Preprocessing
+The spectrum images should be converted to 3-channel (RGB) tensors of size 256×256. Apply standard ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) since the model is pretrained on ImageNet.[^2][^1]
+***
+## Phase 5: Two-Stage Transfer Learning Training
+This is the core methodological contribution. The training proceeds in two stages.
+### Stage 1: Pre-train on Pure Spectrum Dataset
+- **Dataset**: 15,392 pure spectrum images
+- **Labels**: Corresponding [E_C, E_L, E_J] vectors (continuous values)
+- **Loss function**: Mean Squared Error (MSE):
+  Loss = (1/N) * Σ (F_NN(S_E^i) - E^i)^2
+- **Optimizer**: Prodigy with default lr=1.0. Prodigy is parameter-free and adaptively estimates the learning rate.[^8][^7]
+```python
+from prodigyopt import Prodigy
+optimizer = Prodigy(model.parameters(), lr=1.0, weight_decay=0.01)
+```
+- **Training details**: Train until convergence. Use a validation split (~10-15%) from the pure dataset to monitor overfitting. The paper does not specify exact epoch counts, so train until validation loss plateaus (likely 50–200 epochs depending on batch size).
+- **Batch size**: Not explicitly stated; start with 32 or 64.
+- **Scheduler**: Cosine annealing is recommended with Prodigy.[^7]
+```python
+scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=total_iterations)
+```
+### Stage 2: Fine-tune on Dispersive Readout Dataset
+- **Dataset**: 469 dispersive readout spectrum images
+- **Initialization**: Load all weights from Stage 1
+- **Loss function**: Same MSE loss
+- **Optimizer**: Prodigy (reinitialize for the new stage)
+- **Training**: Fine-tune the entire model on the smaller, more realistic dataset. This transfer learning step is critical — the pure spectrum pre-training provides a strong initialization, and the dispersive dataset aligns the model with experimental conditions.
+- **Caution**: With only 469 samples, overfitting is a risk. Use aggressive data augmentation (random horizontal flips, small rotations, slight noise injection) and early stopping.
+***
+## Phase 6: Evaluation and Validation
+### Test Dataset
+Generate **512 test spectra** with non-repetitive parameter combinations distinct from training data, within the same parameter ranges.
+### Accuracy Metric
+The paper defines accuracy per parameter as:
+Acc(E_ν) = (1/N_test) * Σ (1 - |E_ν^i - E_ν^{true,i}| / R(E_ν^{test}))
+where R(E_ν^{test}) is the training range (2.5 GHz for E_C, 1.9 GHz for E_L, 8.0 GHz for E_J). This differs from standard classification accuracy — it measures how close predictions are relative to the parameter range.
+### Target Accuracies
+| Parameter | Target Accuracy | Implied Average Deviation |
+|-----------|----------------|--------------------------|
+| E_C | 94.5% | 0.125 GHz |
+| E_L | 97.1% | 0.095 GHz |
+| E_J | 95.3% | 0.4 GHz |
+| **Overall** | **95.6%** | — |
+These are the benchmarks from the paper.
+### Error and Cost Metrics
+The combined error function:
+Error = 1 - (1/3) * Σ Acc(E_ν) for ν = C, L, J
+The cost function measures spectral fit quality:
+Cost = (1/N) * Σ (f(phi_i) - f_i)^2
+where f(phi_i) is the transition frequency calculated from the predicted parameters.
+***
+## Phase 7: Automatic Fitting Pipeline (End-to-End)
+Once the ML model is trained, the full automatic characterization pipeline works as follows:
+### Step 1: Preprocess Experimental Data
+- Apply a **band-pass filter**: keep data points with signal magnitude > 2.5 standard deviations above background average and < 20% of maximum measured magnitude.
+- Use **`scipy.signal.find_peaks_cwt`** to detect transition spectrum peaks at magnitude extrema.[^9]
+### Step 2: ML Initial Guess
+- Feed the preprocessed spectrum image into the trained Swin Transformer V2 model.
+- Obtain initial guesses: E_C^0, E_L^0, E_J^0.
+### Step 3: Transition Identification
+- Simulate a spectrum using the ML-predicted parameters.
+- Label each experimental data point by associating it with the nearest simulated transition, provided the nearest transition is within **0.3 GHz**.
+- Exclude points that are far from any simulated transition or fall within regions where multiple transitions overlap within 0.3 GHz.
+### Step 4: Least-Squares Fitting
+- Use the ML predictions as initial guesses for a least-squares fit (e.g., `scipy.optimize.least_squares` or `scipy.optimize.curve_fit`).
+- Fit the labeled data points to the fluxonium Hamiltonian model.
+- Constrain fitting to **5 iterations** as in the paper's benchmarks.
+- Output final refined values of E_C, E_L, E_J.
+***
+## Phase 8: Reproducing Key Results
+### Result 1: Prediction Accuracy (Figure 4)
+- Run inference on 512 test spectra.
+- Plot predicted vs. true values for each of E_C, E_L, E_J.
+- Compute average accuracy using the custom metric. Target: ~95.6% overall.
+### Result 2: Error and Cost Landscapes (Figures 5–6)
+- Choose a test case, e.g., (E_C=1.28, E_J=6.50, E_L=0.70) GHz.
+- Generate a 2D grid of initial parameter guesses.
+- For each initial guess, run 5 fitting iterations and compute Error and Cost.
+- Plot heatmaps showing that the ML prediction falls in the darkest (lowest error/cost) region.
+### Result 3: ML vs. Random Initial Guess (Table 1)
+- For 60 parameter sets, compare:
+  - 512 random initial guesses → 5 fitting iterations → average Error and Cost
+  - ML initial guess → 5 fitting iterations → Error and Cost
+| Method | Avg Error | Std Error | Avg Cost | Std Cost |
+|--------|-----------|-----------|----------|----------|
+| Random initial values | 0.218 | 0.098 | 0.146 | 0.130 |
+| ML prediction | 0.037 | 0.088 | 0.024 | 0.083 |
+The ML approach should yield nearly one order of magnitude improvement.
+### Result 4: Real Experimental Data (Figure 7)
+- If access to real fluxonium measurement data is available, apply the full pipeline.
+- The paper demonstrates successful characterization with only partial spectra (4.0–5.9 GHz instead of 4.0–8.0 GHz) and even with half-period symmetrized data.
+***
+## Phase 9: Practical Tips and Troubleshooting
+### Data Generation Optimization
+- **Parallelization**: Use Python's `multiprocessing` to generate spectra in parallel. Each spectrum is independent.
+- **Caching**: Save computed eigenvalues to disk (HDF5 or NumPy arrays) so you don't recompute if training is restarted.
+- **scqubits cutoff**: Use cutoff=110 for the fluxonium Hilbert space. Lower cutoffs may miss higher transitions; higher cutoffs waste computation time.[^5]
+### Image Representation
+- The paper plots spectra as black dots on a white background. Ensure consistent resolution (256×256) and normalization.
+- Consider using a fixed pixel grid: map phi_ext ∈ [0, 2π] to x ∈  and frequency ∈ [4.0, 8.0] GHz to y ∈ .
+- Each dot should be at least 1–2 pixels wide for visibility.
+### Training Stability
+- Prodigy with lr=1.0 is recommended. If training is unstable, reduce `d_coef` to 0.5.[^7]
+- For the fine-tuning stage (469 samples), consider freezing early layers of the Swin Transformer and only fine-tuning the later layers and the regression head.
+- Monitor for overfitting by tracking validation loss closely in Stage 2.
+### Label Normalization
+- Normalize target values to  by dividing by the parameter range (e.g., E_C_normalized = (E_C - 0.5) / 2.5). This helps MSE loss treat all three parameters equally.[^16]
+- At inference time, denormalize predictions back to physical units.
+***
+## Complete Replication Checklist
+| Step | Task | Status |
+|------|------|--------|
+| 1 | Install all dependencies (PyTorch, QuTiP, scqubits, prodigyopt, timm) | ☐ |
+| 2 | Implement fluxonium Hamiltonian spectrum generator | ☐ |
+| 3 | Generate 15,392 pure spectrum images + labels | ☐ |
+| 4 | Generate 469 dispersive readout spectrum images + labels | ☐ |
+| 5 | Generate 512 test spectrum images + labels | ☐ |
+| 6 | Set up Swin V2 Tiny model with 3-output regression head | ☐ |
+| 7 | Stage 1: Train on pure spectrum dataset with Prodigy optimizer | ☐ |
+| 8 | Stage 2: Fine-tune on dispersive readout dataset | ☐ |
+| 9 | Evaluate on test set — target ~95.6% accuracy | ☐ |
+| 10 | Implement automatic fitting pipeline (filter → ML → label → fit) | ☐ |
+| 11 | Reproduce Error/Cost comparison (Table 1) | ☐ |
+| 12 | (Optional) Apply to real experimental data | ☐ |
+***
+## Key References and Resources
+- **Paper**: arXiv:2503.12099 — Kung et al., "Automatic Characterization of Fluxonium Superconducting Qubits Parameters with Deep Transfer Learning"
+- **Swin Transformer V2**: Liu et al., CVPR 2022 — architecture details and pretrained weights[^10]
+- **Prodigy Optimizer**: Mishchenko & Defazio, arXiv:2306.06101 — parameter-free adaptive optimizer[^8]
+- **scqubits**: Koch et al., Quantum 5, 583 (2021) — Python package for superconducting qubit simulation[^6]
+- **QuTiP**: Quantum Toolbox in Python — used for Hamiltonian diagonalization[^4]
+- **torchvision SwinV2**: Official PyTorch implementation with ImageNet-pretrained weights[^1]
+---
+## References
+1. [swin_v2_t — Torchvision main documentation](https://docs.pytorch.org/vision/main/models/generated/torchvision.models.swin_v2_t.html)
+2. [swin_v2_b — Torchvision main documentation](https://docs.pytorch.org/vision/main/models/generated/torchvision.models.swin_v2_b.html) - Constructs a swin_v2_base architecture from Swin Transformer V2: Scaling Up Capacity and Resolution....
+3. [Loading a pre-trained SwinV2 transformer and modifying the architecture · huggingface pytorch-image-models · Discussion #1843](https://github.com/huggingface/pytorch-image-models/discussions/1843) - I am trying to create a SwinV2 transformer model by loading pretrained weights and later modifying s...
+4. [Accelerate Qubit Research with NVIDIA cuQuantum Integrations in ...](https://developer.nvidia.com/blog/accelerate-qubit-research-with-nvidia-cuquantum-integrations-in-qutip-and-scqubits/) - The outputs of scQubits can also easily serve as inputs for analog quantum dynamics simulations usin...
+5. [Fluxonium Qubit — scqubits Documentation](https://scqubits.readthedocs.io/en/v2.0_a/guide/qubits/fluxonium.html) - An instance of the fluxonium qubit is created as follows: fluxonium = scqubits.Fluxonium(EJ = 8.9, E...
+6. [Scqubits: a Python package for superconducting qubits](https://arxiv.org/abs/2107.08552) - $\textbf{scqubits}$ is an open-source Python package for simulating and analyzing superconducting ci...
+7. [prodigyopt](https://pypi.org/project/prodigyopt/) - An Adam-like optimizer for neural networks with adaptive estimation of learning rate
+8. [The Prodigy optimizer and its variants for training neural ...](https://github.com/konstmish/prodigy) - The Prodigy optimizer and its variants for training neural networks. - konstmish/prodigy
+9. [find_peaks_cwt — SciPy v1.17.0 Manual](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html)
+10. [Swin Transformer V2: Scaling Up Capacity and Resolution](https://ieeexplore.ieee.org/document/9879380/) - We present techniques for scaling Swin Transformer [35] up to 3 billion parameters and making it cap...
+11. [Swin Transformer V2: Advancing Computer Vision with Scalable ...](https://www.raulartigues.com/en/post/swin-transformer-v2-advancing-computer-vision-with-scalable-neural-architectures) - Architecture & Functionality Swin Transformer V2 retains the hierarchical structure of its predece...
+12. [SwinCNet leveraging Swin Transformer V2 and CNN for precise color correction and detail enhancement in underwater image restoration](https://www.frontiersin.org/articles/10.3389/fmars.2025.1523729/full) - Underwater image restoration confronts three major challenges: color distortion, contrast degradatio...
+13. [Retinal vessel segmentation using a swin transformer-based encoder-decoder architecture](https://link.springer.com/10.1007/s11760-025-05089-1)
+14. [DUSFormer: Dual-Swin Transformer V2 Aggregate Network for Polyp Segmentation](https://ieeexplore.ieee.org/document/10387670/) - The convolutional neural network method has certain limitations in medical image segmentation. As a ...
+15. [Leveraging Swin Transformer for Local-to-Global Weakly Supervised
+  Semantic Segmentation](https://arxiv.org/pdf/2401.17828.pdf) - ...a
+0.98% mAP higher localization accuracy, outperforming state-of-the-art models.
+It also yields c...
+16. [An Image Denoising Method Based on Swin Transformer V2 and U-Net Architecture](https://ieeexplore.ieee.org/document/10807930/) - To address the issue of image degradation caused by noise during image acquisition and transmission,...