YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Quantum ML

Overview

This guide provides a complete, actionable roadmap to replicate the results from Kung et al. (arXiv:2503.12099), which presents a machine learning approach for automatic characterization of fluxonium superconducting qubit parameters (E_J, E_C, E_L) using a Swin Transformer V2 model trained via deep transfer learning. The paper reports ~95.6% average accuracy across all three energy parameters. The authors mention that source code will be available on GitHub after publication, but since it may not yet be released, this guide reconstructs every detail needed for a from-scratch replication.

Phase 1: Environment Setup

Hardware Requirements

A GPU with at least 8 GB VRAM (NVIDIA RTX 3060 or better recommended). Swin Transformer V2 Tiny has ~28M parameters and is relatively lightweight.[^1]
Sufficient CPU/RAM for generating 15,000+ spectrum simulations via QuTiP.

Software Dependencies

Install the following Python packages:

PyTorch (≥1.12) with CUDA support
torchvision — provides pre-built Swin Transformer V2 models (swin_v2_t, swin_v2_b)[^2][^1]
timm (PyTorch Image Models) — alternative source for swinv2_tiny_window8_256 and other variants[^3]
QuTiP — Quantum Toolbox in Python for Hamiltonian diagonalization and spectrum computation[^4]
scqubits — optional but helpful for fluxonium simulation and validation[^5][^6]
prodigyopt — the Prodigy optimizer (pip install prodigyopt)[^7][^8]
scipy — for find_peaks_cwt peak detection[^9]
numpy, matplotlib, PIL/Pillow

pip install torch torchvision timm qutip scqubits prodigyopt scipy numpy matplotlib pillow

Phase 2: Understanding the Fluxonium Hamiltonian

The Model Hamiltonian

The fluxonium qubit Hamiltonian is:

H = 4 * E_C * n^2 - E_J * cos(phi + phi_ext) + 0.5 * E_L * phi^2

where:

E_C = charging energy (capacitance)
E_J = Josephson energy
E_L = inductive energy
phi = phase operator across inductance
n = displacement charge operator
phi_ext = external magnetic flux (varied over one flux quantum period)

Parameter Ranges

The training data spans these experimentally relevant ranges:

Parameter	Range (GHz)	Span
E_C	0.5 – 3.0	2.5 GHz
E_L	0.1 – 2.0	1.9 GHz
E_J	2.0 – 10.0	8.0 GHz

Transitions Considered

The energy transitions used are: 0→1, 0→2, 0→3, 0→4, 0→5, 1→2, and 1→3, all within the frequency window of 4.0–8.0 GHz.

Phase 3: Generating Training Data

This is the most computationally intensive phase. There are two distinct datasets to generate.

Dataset 1: Pure Spectrum Dataset (N = 15,392)

This dataset contains only the bare transition energies (no coupling/readout effects), making it fast to compute.

For each parameter combination (E_C, E_L, E_J):

Sample parameters randomly or on a grid within the ranges above. The paper uses 15,392 unique combinations.
Sweep phi_ext with 256 points per flux period (0 to 2π).
Diagonalize the Hamiltonian at each flux point using QuTiP. Use scqubits.Fluxonium or build the Hamiltonian matrix directly in QuTiP with a sufficiently large cutoff (typically 110 states).[^5]
Compute transition energies between all relevant level pairs (0-1, 0-2, ..., 1-3).
Filter transitions to retain only those within 4.0–8.0 GHz.
Render as an image: Plot each valid transition point as a black dot on a 2D image (x-axis = phi_ext, y-axis = frequency in GHz). The image serves as input to the Swin Transformer.

Example code sketch for a single spectrum:

import scqubits as scq
import numpy as np

def generate_pure_spectrum(EC, EL, EJ, n_flux=256, cutoff=110):
    fluxonium = scq.Fluxonium(EJ=EJ, EC=EC, EL=EL, flux=0.0, cutoff=cutoff)
    flux_vals = np.linspace(0.0, 1.0, n_flux)  # in units of Phi_0
    
    transitions = [(0,1), (0,2), (0,3), (0,4), (0,5), (1,2), (1,3)]
    spectrum_points = []
    
    for flux in flux_vals:
        fluxonium.flux = flux
        evals = fluxonium.eigenvals(evals_count=6)
        for (i, j) in transitions:
            if j < len(evals):
                freq = evals[j] - evals[i]
                if 4.0 <= freq <= 8.0:
                    spectrum_points.append((flux, freq))
    
    return spectrum_points

Image generation: Convert each spectrum into a fixed-resolution image (e.g., 256×256 pixels). The Swin Transformer V2 Tiny expects 256×256 input. Plot flux on the x-axis and frequency on the y-axis, with black dots on a white background. Save as PNG or convert directly to a tensor.[^1]

Dataset 2: Dispersive Readout Dataset (N = 469)

This dataset simulates a more realistic measurement scenario including dispersive readout effects:

Readout resonator at 6.00 GHz with linewidth 7 MHz and coupling strength g = 100 MHz.
Compute the dispersive shift for each transition using second-order perturbation theory.
Calculate voltage change in readout response caused by dispersive shift for a saturation drive at every transition and flux value.
Threshold: Exclude data points where readout voltage change < 10% of maximum magnitude at readout resonance.
Render as image similarly to the pure spectrum, but now transition points carry varying intensities based on signal magnitude.

This computation is >100× slower per spectrum than the pure dataset, which is why only 469 samples are used. The dispersive readout dataset is critical for the transfer learning step.

Phase 4: Model Architecture — Swin Transformer V2

Model Selection

The paper uses Swin Transformer V2, chosen for its lightweight architecture compared to ResNet and DenseNet alternatives. The exact variant isn't specified, but the Swin V2 Tiny model is the most practical choice:[^10][^11]

Property	Swin V2 Tiny
Parameters	~28.3M[^1]
Input resolution	256 × 256
GFLOPs	5.94[^1]
Embed dim	96
Depths	[^12][^12][^13][^12]
Num heads	[^14][^13][^15][^7]
Window size	8

Loading the Model

import torchvision.models as models
import torch.nn as nn

# Load pretrained Swin V2 Tiny (ImageNet weights)
model = models.swin_v2_t(weights=models.Swin_V2_T_Weights.IMAGENET1K_V1)

# Modify the classification head for regression (3 outputs: EC, EL, EJ)
model.head = nn.Linear(model.head.in_features, 3)

Alternatively, using timm:

import timm

model = timm.create_model('swinv2_tiny_window8_256', pretrained=True, num_classes=3)

Input Preprocessing

The spectrum images should be converted to 3-channel (RGB) tensors of size 256×256. Apply standard ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) since the model is pretrained on ImageNet.[^2][^1]

Phase 5: Two-Stage Transfer Learning Training

This is the core methodological contribution. The training proceeds in two stages.

Stage 1: Pre-train on Pure Spectrum Dataset

Dataset: 15,392 pure spectrum images
Labels: Corresponding [E_C, E_L, E_J] vectors (continuous values)
Loss function: Mean Squared Error (MSE):

Loss = (1/N) * Σ (F_NN(S_E^i) - E^i)^2
Optimizer: Prodigy with default lr=1.0. Prodigy is parameter-free and adaptively estimates the learning rate.[^8][^7]

from prodigyopt import Prodigy

optimizer = Prodigy(model.parameters(), lr=1.0, weight_decay=0.01)

Training details: Train until convergence. Use a validation split (~10-15%) from the pure dataset to monitor overfitting. The paper does not specify exact epoch counts, so train until validation loss plateaus (likely 50–200 epochs depending on batch size).
Batch size: Not explicitly stated; start with 32 or 64.
Scheduler: Cosine annealing is recommended with Prodigy.[^7]

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=total_iterations)

Stage 2: Fine-tune on Dispersive Readout Dataset

Dataset: 469 dispersive readout spectrum images
Initialization: Load all weights from Stage 1
Loss function: Same MSE loss
Optimizer: Prodigy (reinitialize for the new stage)
Training: Fine-tune the entire model on the smaller, more realistic dataset. This transfer learning step is critical — the pure spectrum pre-training provides a strong initialization, and the dispersive dataset aligns the model with experimental conditions.
Caution: With only 469 samples, overfitting is a risk. Use aggressive data augmentation (random horizontal flips, small rotations, slight noise injection) and early stopping.

Phase 6: Evaluation and Validation

Test Dataset

Generate 512 test spectra with non-repetitive parameter combinations distinct from training data, within the same parameter ranges.

Accuracy Metric

The paper defines accuracy per parameter as:

Acc(E_ν) = (1/N_test) * Σ (1 - |E_ν^i - E_ν^{true,i}| / R(E_ν^{test}))

where R(E_ν^{test}) is the training range (2.5 GHz for E_C, 1.9 GHz for E_L, 8.0 GHz for E_J). This differs from standard classification accuracy — it measures how close predictions are relative to the parameter range.

Target Accuracies

Parameter	Target Accuracy	Implied Average Deviation
E_C	94.5%	0.125 GHz
E_L	97.1%	0.095 GHz
E_J	95.3%	0.4 GHz
Overall	95.6%	—

These are the benchmarks from the paper.

Error and Cost Metrics

The combined error function:

Error = 1 - (1/3) * Σ Acc(E_ν) for ν = C, L, J

The cost function measures spectral fit quality:

Cost = (1/N) * Σ (f(phi_i) - f_i)^2

where f(phi_i) is the transition frequency calculated from the predicted parameters.

Phase 7: Automatic Fitting Pipeline (End-to-End)

Once the ML model is trained, the full automatic characterization pipeline works as follows:

Step 1: Preprocess Experimental Data

Apply a band-pass filter: keep data points with signal magnitude > 2.5 standard deviations above background average and < 20% of maximum measured magnitude.
Use scipy.signal.find_peaks_cwt to detect transition spectrum peaks at magnitude extrema.[^9]

Step 2: ML Initial Guess

Feed the preprocessed spectrum image into the trained Swin Transformer V2 model.
Obtain initial guesses: E_C^0, E_L^0, E_J^0.

Step 3: Transition Identification

Simulate a spectrum using the ML-predicted parameters.
Label each experimental data point by associating it with the nearest simulated transition, provided the nearest transition is within 0.3 GHz.
Exclude points that are far from any simulated transition or fall within regions where multiple transitions overlap within 0.3 GHz.

Step 4: Least-Squares Fitting

Use the ML predictions as initial guesses for a least-squares fit (e.g., scipy.optimize.least_squares or scipy.optimize.curve_fit).
Fit the labeled data points to the fluxonium Hamiltonian model.
Constrain fitting to 5 iterations as in the paper's benchmarks.
Output final refined values of E_C, E_L, E_J.

Phase 8: Reproducing Key Results

Result 1: Prediction Accuracy (Figure 4)

Run inference on 512 test spectra.
Plot predicted vs. true values for each of E_C, E_L, E_J.
Compute average accuracy using the custom metric. Target: ~95.6% overall.

Result 2: Error and Cost Landscapes (Figures 5–6)

Choose a test case, e.g., (E_C=1.28, E_J=6.50, E_L=0.70) GHz.
Generate a 2D grid of initial parameter guesses.
For each initial guess, run 5 fitting iterations and compute Error and Cost.
Plot heatmaps showing that the ML prediction falls in the darkest (lowest error/cost) region.

Result 3: ML vs. Random Initial Guess (Table 1)

For 60 parameter sets, compare:
- 512 random initial guesses → 5 fitting iterations → average Error and Cost
- ML initial guess → 5 fitting iterations → Error and Cost

Method	Avg Error	Std Error	Avg Cost	Std Cost
Random initial values	0.218	0.098	0.146	0.130
ML prediction	0.037	0.088	0.024	0.083

The ML approach should yield nearly one order of magnitude improvement.

Result 4: Real Experimental Data (Figure 7)

If access to real fluxonium measurement data is available, apply the full pipeline.
The paper demonstrates successful characterization with only partial spectra (4.0–5.9 GHz instead of 4.0–8.0 GHz) and even with half-period symmetrized data.

Phase 9: Practical Tips and Troubleshooting

Data Generation Optimization

Parallelization: Use Python's multiprocessing to generate spectra in parallel. Each spectrum is independent.
Caching: Save computed eigenvalues to disk (HDF5 or NumPy arrays) so you don't recompute if training is restarted.
scqubits cutoff: Use cutoff=110 for the fluxonium Hilbert space. Lower cutoffs may miss higher transitions; higher cutoffs waste computation time.[^5]

Image Representation

The paper plots spectra as black dots on a white background. Ensure consistent resolution (256×256) and normalization.
Consider using a fixed pixel grid: map phi_ext ∈ [0, 2π] to x ∈ and frequency ∈ [4.0, 8.0] GHz to y ∈ .
Each dot should be at least 1–2 pixels wide for visibility.

Training Stability

Prodigy with lr=1.0 is recommended. If training is unstable, reduce d_coef to 0.5.[^7]
For the fine-tuning stage (469 samples), consider freezing early layers of the Swin Transformer and only fine-tuning the later layers and the regression head.
Monitor for overfitting by tracking validation loss closely in Stage 2.

Label Normalization

Normalize target values to by dividing by the parameter range (e.g., E_C_normalized = (E_C - 0.5) / 2.5). This helps MSE loss treat all three parameters equally.[^16]
At inference time, denormalize predictions back to physical units.

Complete Replication Checklist

Step	Task	Status
1	Install all dependencies (PyTorch, QuTiP, scqubits, prodigyopt, timm)	☐
2	Implement fluxonium Hamiltonian spectrum generator	☐
3	Generate 15,392 pure spectrum images + labels	☐
4	Generate 469 dispersive readout spectrum images + labels	☐
5	Generate 512 test spectrum images + labels	☐
6	Set up Swin V2 Tiny model with 3-output regression head	☐
7	Stage 1: Train on pure spectrum dataset with Prodigy optimizer	☐
8	Stage 2: Fine-tune on dispersive readout dataset	☐
9	Evaluate on test set — target ~95.6% accuracy	☐
10	Implement automatic fitting pipeline (filter → ML → label → fit)	☐
11	Reproduce Error/Cost comparison (Table 1)	☐
12	(Optional) Apply to real experimental data	☐

Key References and Resources

Paper: arXiv:2503.12099 — Kung et al., "Automatic Characterization of Fluxonium Superconducting Qubits Parameters with Deep Transfer Learning"
Swin Transformer V2: Liu et al., CVPR 2022 — architecture details and pretrained weights[^10]
Prodigy Optimizer: Mishchenko & Defazio, arXiv:2306.06101 — parameter-free adaptive optimizer[^8]
scqubits: Koch et al., Quantum 5, 583 (2021) — Python package for superconducting qubit simulation[^6]
QuTiP: Quantum Toolbox in Python — used for Hamiltonian diagonalization[^4]
torchvision SwinV2: Official PyTorch implementation with ImageNet-pretrained weights[^1]

References

swin_v2_t — Torchvision main documentation
swin_v2_b — Torchvision main documentation - Constructs a swin_v2_base architecture from Swin Transformer V2: Scaling Up Capacity and Resolution....
Loading a pre-trained SwinV2 transformer and modifying the architecture · huggingface pytorch-image-models · Discussion #1843 - I am trying to create a SwinV2 transformer model by loading pretrained weights and later modifying s...
Accelerate Qubit Research with NVIDIA cuQuantum Integrations in ... - The outputs of scQubits can also easily serve as inputs for analog quantum dynamics simulations usin...
Fluxonium Qubit — scqubits Documentation - An instance of the fluxonium qubit is created as follows: fluxonium = scqubits.Fluxonium(EJ = 8.9, E...
Scqubits: a Python package for superconducting qubits - $\textbf{scqubits}$ is an open-source Python package for simulating and analyzing superconducting ci...
prodigyopt - An Adam-like optimizer for neural networks with adaptive estimation of learning rate
The Prodigy optimizer and its variants for training neural ... - The Prodigy optimizer and its variants for training neural networks. - konstmish/prodigy
find_peaks_cwt — SciPy v1.17.0 Manual
Swin Transformer V2: Scaling Up Capacity and Resolution - We present techniques for scaling Swin Transformer [35] up to 3 billion parameters and making it cap...
Swin Transformer V2: Advancing Computer Vision with Scalable ... - Architecture & Functionality Swin Transformer V2 retains the hierarchical structure of its predece...
SwinCNet leveraging Swin Transformer V2 and CNN for precise color correction and detail enhancement in underwater image restoration - Underwater image restoration confronts three major challenges: color distortion, contrast degradatio...
Retinal vessel segmentation using a swin transformer-based encoder-decoder architecture
DUSFormer: Dual-Swin Transformer V2 Aggregate Network for Polyp Segmentation - The convolutional neural network method has certain limitations in medical image segmentation. As a ...
Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation - ...a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models. It also yields c...
An Image Denoising Method Based on Swin Transformer V2 and U-Net Architecture - To address the issue of image degradation caused by noise during image acquisition and transmission,...

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for DaMsTaR/QuantumML

Automatic Characterization of Fluxonium Superconducting Qubits Parameters with Deep Transfer Learning

Paper • 2503.12099 • Published Mar 15, 2025

Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

Paper • 2401.17828 • Published Mar 11, 2024

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Paper • 2306.06101 • Published Jun 9, 2023

Scqubits: a Python package for superconducting qubits

Paper • 2107.08552 • Published Nov 14, 2021