YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Quantum ML
Overview
This guide provides a complete, actionable roadmap to replicate the results from Kung et al. (arXiv:2503.12099), which presents a machine learning approach for automatic characterization of fluxonium superconducting qubit parameters (E_J, E_C, E_L) using a Swin Transformer V2 model trained via deep transfer learning. The paper reports ~95.6% average accuracy across all three energy parameters. The authors mention that source code will be available on GitHub after publication, but since it may not yet be released, this guide reconstructs every detail needed for a from-scratch replication.
Phase 1: Environment Setup
Hardware Requirements
- A GPU with at least 8 GB VRAM (NVIDIA RTX 3060 or better recommended). Swin Transformer V2 Tiny has ~28M parameters and is relatively lightweight.[^1]
- Sufficient CPU/RAM for generating 15,000+ spectrum simulations via QuTiP.
Software Dependencies
Install the following Python packages:
- PyTorch (β₯1.12) with CUDA support
- torchvision β provides pre-built Swin Transformer V2 models (
swin_v2_t,swin_v2_b)[^2][^1] - timm (PyTorch Image Models) β alternative source for
swinv2_tiny_window8_256and other variants[^3] - QuTiP β Quantum Toolbox in Python for Hamiltonian diagonalization and spectrum computation[^4]
- scqubits β optional but helpful for fluxonium simulation and validation[^5][^6]
- prodigyopt β the Prodigy optimizer (
pip install prodigyopt)[^7][^8] - scipy β for
find_peaks_cwtpeak detection[^9] - numpy, matplotlib, PIL/Pillow
pip install torch torchvision timm qutip scqubits prodigyopt scipy numpy matplotlib pillow
Phase 2: Understanding the Fluxonium Hamiltonian
The Model Hamiltonian
The fluxonium qubit Hamiltonian is:
H = 4 * E_C * n^2 - E_J * cos(phi + phi_ext) + 0.5 * E_L * phi^2
where:
- E_C = charging energy (capacitance)
- E_J = Josephson energy
- E_L = inductive energy
- phi = phase operator across inductance
- n = displacement charge operator
- phi_ext = external magnetic flux (varied over one flux quantum period)
Parameter Ranges
The training data spans these experimentally relevant ranges:
| Parameter | Range (GHz) | Span |
|---|---|---|
| E_C | 0.5 β 3.0 | 2.5 GHz |
| E_L | 0.1 β 2.0 | 1.9 GHz |
| E_J | 2.0 β 10.0 | 8.0 GHz |
Transitions Considered
The energy transitions used are: 0β1, 0β2, 0β3, 0β4, 0β5, 1β2, and 1β3, all within the frequency window of 4.0β8.0 GHz.
Phase 3: Generating Training Data
This is the most computationally intensive phase. There are two distinct datasets to generate.
Dataset 1: Pure Spectrum Dataset (N = 15,392)
This dataset contains only the bare transition energies (no coupling/readout effects), making it fast to compute.
For each parameter combination (E_C, E_L, E_J):
- Sample parameters randomly or on a grid within the ranges above. The paper uses 15,392 unique combinations.
- Sweep phi_ext with 256 points per flux period (0 to 2Ο).
- Diagonalize the Hamiltonian at each flux point using QuTiP. Use
scqubits.Fluxoniumor build the Hamiltonian matrix directly in QuTiP with a sufficiently large cutoff (typically 110 states).[^5] - Compute transition energies between all relevant level pairs (0-1, 0-2, ..., 1-3).
- Filter transitions to retain only those within 4.0β8.0 GHz.
- Render as an image: Plot each valid transition point as a black dot on a 2D image (x-axis = phi_ext, y-axis = frequency in GHz). The image serves as input to the Swin Transformer.
Example code sketch for a single spectrum:
import scqubits as scq
import numpy as np
def generate_pure_spectrum(EC, EL, EJ, n_flux=256, cutoff=110):
fluxonium = scq.Fluxonium(EJ=EJ, EC=EC, EL=EL, flux=0.0, cutoff=cutoff)
flux_vals = np.linspace(0.0, 1.0, n_flux) # in units of Phi_0
transitions = [(0,1), (0,2), (0,3), (0,4), (0,5), (1,2), (1,3)]
spectrum_points = []
for flux in flux_vals:
fluxonium.flux = flux
evals = fluxonium.eigenvals(evals_count=6)
for (i, j) in transitions:
if j < len(evals):
freq = evals[j] - evals[i]
if 4.0 <= freq <= 8.0:
spectrum_points.append((flux, freq))
return spectrum_points
Image generation: Convert each spectrum into a fixed-resolution image (e.g., 256Γ256 pixels). The Swin Transformer V2 Tiny expects 256Γ256 input. Plot flux on the x-axis and frequency on the y-axis, with black dots on a white background. Save as PNG or convert directly to a tensor.[^1]
Dataset 2: Dispersive Readout Dataset (N = 469)
This dataset simulates a more realistic measurement scenario including dispersive readout effects:
- Readout resonator at 6.00 GHz with linewidth 7 MHz and coupling strength g = 100 MHz.
- Compute the dispersive shift for each transition using second-order perturbation theory.
- Calculate voltage change in readout response caused by dispersive shift for a saturation drive at every transition and flux value.
- Threshold: Exclude data points where readout voltage change < 10% of maximum magnitude at readout resonance.
- Render as image similarly to the pure spectrum, but now transition points carry varying intensities based on signal magnitude.
This computation is >100Γ slower per spectrum than the pure dataset, which is why only 469 samples are used. The dispersive readout dataset is critical for the transfer learning step.
Phase 4: Model Architecture β Swin Transformer V2
Model Selection
The paper uses Swin Transformer V2, chosen for its lightweight architecture compared to ResNet and DenseNet alternatives. The exact variant isn't specified, but the Swin V2 Tiny model is the most practical choice:[^10][^11]
| Property | Swin V2 Tiny |
|---|---|
| Parameters | ~28.3M[^1] |
| Input resolution | 256 Γ 256 |
| GFLOPs | 5.94[^1] |
| Embed dim | 96 |
| Depths | [^12][^12][^13][^12] |
| Num heads | [^14][^13][^15][^7] |
| Window size | 8 |
Loading the Model
import torchvision.models as models
import torch.nn as nn
# Load pretrained Swin V2 Tiny (ImageNet weights)
model = models.swin_v2_t(weights=models.Swin_V2_T_Weights.IMAGENET1K_V1)
# Modify the classification head for regression (3 outputs: EC, EL, EJ)
model.head = nn.Linear(model.head.in_features, 3)
Alternatively, using timm:
import timm
model = timm.create_model('swinv2_tiny_window8_256', pretrained=True, num_classes=3)
Input Preprocessing
The spectrum images should be converted to 3-channel (RGB) tensors of size 256Γ256. Apply standard ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) since the model is pretrained on ImageNet.[^2][^1]
Phase 5: Two-Stage Transfer Learning Training
This is the core methodological contribution. The training proceeds in two stages.
Stage 1: Pre-train on Pure Spectrum Dataset
Dataset: 15,392 pure spectrum images
Labels: Corresponding [E_C, E_L, E_J] vectors (continuous values)
Loss function: Mean Squared Error (MSE):
Loss = (1/N) * Ξ£ (F_NN(S_E^i) - E^i)^2
Optimizer: Prodigy with default lr=1.0. Prodigy is parameter-free and adaptively estimates the learning rate.[^8][^7]
from prodigyopt import Prodigy
optimizer = Prodigy(model.parameters(), lr=1.0, weight_decay=0.01)
- Training details: Train until convergence. Use a validation split (~10-15%) from the pure dataset to monitor overfitting. The paper does not specify exact epoch counts, so train until validation loss plateaus (likely 50β200 epochs depending on batch size).
- Batch size: Not explicitly stated; start with 32 or 64.
- Scheduler: Cosine annealing is recommended with Prodigy.[^7]
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=total_iterations)
Stage 2: Fine-tune on Dispersive Readout Dataset
- Dataset: 469 dispersive readout spectrum images
- Initialization: Load all weights from Stage 1
- Loss function: Same MSE loss
- Optimizer: Prodigy (reinitialize for the new stage)
- Training: Fine-tune the entire model on the smaller, more realistic dataset. This transfer learning step is critical β the pure spectrum pre-training provides a strong initialization, and the dispersive dataset aligns the model with experimental conditions.
- Caution: With only 469 samples, overfitting is a risk. Use aggressive data augmentation (random horizontal flips, small rotations, slight noise injection) and early stopping.
Phase 6: Evaluation and Validation
Test Dataset
Generate 512 test spectra with non-repetitive parameter combinations distinct from training data, within the same parameter ranges.
Accuracy Metric
The paper defines accuracy per parameter as:
Acc(E_Ξ½) = (1/N_test) * Ξ£ (1 - |E_Ξ½^i - E_Ξ½^{true,i}| / R(E_Ξ½^{test}))
where R(E_Ξ½^{test}) is the training range (2.5 GHz for E_C, 1.9 GHz for E_L, 8.0 GHz for E_J). This differs from standard classification accuracy β it measures how close predictions are relative to the parameter range.
Target Accuracies
| Parameter | Target Accuracy | Implied Average Deviation |
|---|---|---|
| E_C | 94.5% | 0.125 GHz |
| E_L | 97.1% | 0.095 GHz |
| E_J | 95.3% | 0.4 GHz |
| Overall | 95.6% | β |
These are the benchmarks from the paper.
Error and Cost Metrics
The combined error function:
Error = 1 - (1/3) * Ξ£ Acc(E_Ξ½) for Ξ½ = C, L, J
The cost function measures spectral fit quality:
Cost = (1/N) * Ξ£ (f(phi_i) - f_i)^2
where f(phi_i) is the transition frequency calculated from the predicted parameters.
Phase 7: Automatic Fitting Pipeline (End-to-End)
Once the ML model is trained, the full automatic characterization pipeline works as follows:
Step 1: Preprocess Experimental Data
- Apply a band-pass filter: keep data points with signal magnitude > 2.5 standard deviations above background average and < 20% of maximum measured magnitude.
- Use
scipy.signal.find_peaks_cwtto detect transition spectrum peaks at magnitude extrema.[^9]
Step 2: ML Initial Guess
- Feed the preprocessed spectrum image into the trained Swin Transformer V2 model.
- Obtain initial guesses: E_C^0, E_L^0, E_J^0.
Step 3: Transition Identification
- Simulate a spectrum using the ML-predicted parameters.
- Label each experimental data point by associating it with the nearest simulated transition, provided the nearest transition is within 0.3 GHz.
- Exclude points that are far from any simulated transition or fall within regions where multiple transitions overlap within 0.3 GHz.
Step 4: Least-Squares Fitting
- Use the ML predictions as initial guesses for a least-squares fit (e.g.,
scipy.optimize.least_squaresorscipy.optimize.curve_fit). - Fit the labeled data points to the fluxonium Hamiltonian model.
- Constrain fitting to 5 iterations as in the paper's benchmarks.
- Output final refined values of E_C, E_L, E_J.
Phase 8: Reproducing Key Results
Result 1: Prediction Accuracy (Figure 4)
- Run inference on 512 test spectra.
- Plot predicted vs. true values for each of E_C, E_L, E_J.
- Compute average accuracy using the custom metric. Target: ~95.6% overall.
Result 2: Error and Cost Landscapes (Figures 5β6)
- Choose a test case, e.g., (E_C=1.28, E_J=6.50, E_L=0.70) GHz.
- Generate a 2D grid of initial parameter guesses.
- For each initial guess, run 5 fitting iterations and compute Error and Cost.
- Plot heatmaps showing that the ML prediction falls in the darkest (lowest error/cost) region.
Result 3: ML vs. Random Initial Guess (Table 1)
- For 60 parameter sets, compare:
- 512 random initial guesses β 5 fitting iterations β average Error and Cost
- ML initial guess β 5 fitting iterations β Error and Cost
| Method | Avg Error | Std Error | Avg Cost | Std Cost |
|---|---|---|---|---|
| Random initial values | 0.218 | 0.098 | 0.146 | 0.130 |
| ML prediction | 0.037 | 0.088 | 0.024 | 0.083 |
The ML approach should yield nearly one order of magnitude improvement.
Result 4: Real Experimental Data (Figure 7)
- If access to real fluxonium measurement data is available, apply the full pipeline.
- The paper demonstrates successful characterization with only partial spectra (4.0β5.9 GHz instead of 4.0β8.0 GHz) and even with half-period symmetrized data.
Phase 9: Practical Tips and Troubleshooting
Data Generation Optimization
- Parallelization: Use Python's
multiprocessingto generate spectra in parallel. Each spectrum is independent. - Caching: Save computed eigenvalues to disk (HDF5 or NumPy arrays) so you don't recompute if training is restarted.
- scqubits cutoff: Use cutoff=110 for the fluxonium Hilbert space. Lower cutoffs may miss higher transitions; higher cutoffs waste computation time.[^5]
Image Representation
- The paper plots spectra as black dots on a white background. Ensure consistent resolution (256Γ256) and normalization.
- Consider using a fixed pixel grid: map phi_ext β [0, 2Ο] to x β and frequency β [4.0, 8.0] GHz to y β .
- Each dot should be at least 1β2 pixels wide for visibility.
Training Stability
- Prodigy with lr=1.0 is recommended. If training is unstable, reduce
d_coefto 0.5.[^7] - For the fine-tuning stage (469 samples), consider freezing early layers of the Swin Transformer and only fine-tuning the later layers and the regression head.
- Monitor for overfitting by tracking validation loss closely in Stage 2.
Label Normalization
- Normalize target values to by dividing by the parameter range (e.g., E_C_normalized = (E_C - 0.5) / 2.5). This helps MSE loss treat all three parameters equally.[^16]
- At inference time, denormalize predictions back to physical units.
Complete Replication Checklist
| Step | Task | Status |
|---|---|---|
| 1 | Install all dependencies (PyTorch, QuTiP, scqubits, prodigyopt, timm) | β |
| 2 | Implement fluxonium Hamiltonian spectrum generator | β |
| 3 | Generate 15,392 pure spectrum images + labels | β |
| 4 | Generate 469 dispersive readout spectrum images + labels | β |
| 5 | Generate 512 test spectrum images + labels | β |
| 6 | Set up Swin V2 Tiny model with 3-output regression head | β |
| 7 | Stage 1: Train on pure spectrum dataset with Prodigy optimizer | β |
| 8 | Stage 2: Fine-tune on dispersive readout dataset | β |
| 9 | Evaluate on test set β target ~95.6% accuracy | β |
| 10 | Implement automatic fitting pipeline (filter β ML β label β fit) | β |
| 11 | Reproduce Error/Cost comparison (Table 1) | β |
| 12 | (Optional) Apply to real experimental data | β |
Key References and Resources
- Paper: arXiv:2503.12099 β Kung et al., "Automatic Characterization of Fluxonium Superconducting Qubits Parameters with Deep Transfer Learning"
- Swin Transformer V2: Liu et al., CVPR 2022 β architecture details and pretrained weights[^10]
- Prodigy Optimizer: Mishchenko & Defazio, arXiv:2306.06101 β parameter-free adaptive optimizer[^8]
- scqubits: Koch et al., Quantum 5, 583 (2021) β Python package for superconducting qubit simulation[^6]
- QuTiP: Quantum Toolbox in Python β used for Hamiltonian diagonalization[^4]
- torchvision SwinV2: Official PyTorch implementation with ImageNet-pretrained weights[^1]
References
swin_v2_b β Torchvision main documentation - Constructs a swin_v2_base architecture from Swin Transformer V2: Scaling Up Capacity and Resolution....
Loading a pre-trained SwinV2 transformer and modifying the architecture Β· huggingface pytorch-image-models Β· Discussion #1843 - I am trying to create a SwinV2 transformer model by loading pretrained weights and later modifying s...
Accelerate Qubit Research with NVIDIA cuQuantum Integrations in ... - The outputs of scQubits can also easily serve as inputs for analog quantum dynamics simulations usin...
Fluxonium Qubit β scqubits Documentation - An instance of the fluxonium qubit is created as follows: fluxonium = scqubits.Fluxonium(EJ = 8.9, E...
Scqubits: a Python package for superconducting qubits - $\textbf{scqubits}$ is an open-source Python package for simulating and analyzing superconducting ci...
prodigyopt - An Adam-like optimizer for neural networks with adaptive estimation of learning rate
The Prodigy optimizer and its variants for training neural ... - The Prodigy optimizer and its variants for training neural networks. - konstmish/prodigy
Swin Transformer V2: Scaling Up Capacity and Resolution - We present techniques for scaling Swin Transformer [35] up to 3 billion parameters and making it cap...
Swin Transformer V2: Advancing Computer Vision with Scalable ... - Architecture & Functionalityββ Swin Transformer V2 retains the hierarchical structure of its predece...
SwinCNet leveraging Swin Transformer V2 and CNN for precise color correction and detail enhancement in underwater image restoration - Underwater image restoration confronts three major challenges: color distortion, contrast degradatio...
Retinal vessel segmentation using a swin transformer-based encoder-decoder architecture
DUSFormer: Dual-Swin Transformer V2 Aggregate Network for Polyp Segmentation - The convolutional neural network method has certain limitations in medical image segmentation. As a ...
Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation - ...a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models. It also yields c...
An Image Denoising Method Based on Swin Transformer V2 and U-Net Architecture - To address the issue of image degradation caused by noise during image acquisition and transmission,...