Spaces:

ansysresearch
/

quantum

Paused

App Files Files Community

quantum / utils /adapt-aqc /docs /running_options_explained.md

harishaseebat92

Fix: Add adapt-aqc as regular files, not submodule

6d8ed8c 23 days ago

preview code

raw

history blame contribute delete

15.8 kB

This document is to give a further in-detail explanation about the different ADAPT-AQC options made available through AdaptConfig and AdaptCompiler.

AdaptConfig()

method="general_gradient"

This is the core heuristics of ADAPT-AQC, whereby the next two-qubit unitary is placed on the pair of qubits which would give the largest cost gradient $\Vert\vec{\nabla} C\Vert$. For more details please see Appendix A of https://arxiv.org/abs/2503.09683.

method="ISL"

What is it:
When using this method, the ansatz is adaptively constructed by prioritising pairs of qubits which have larger pairwise entanglement. This is the original heuristic from https://github.com/abhishekagarwal2301/isl which has been kept as the default here as it is the only one which supports all backends.

When AdaptConfig.reuse_exponent = 0, which is the default setting, the pair with the largest entanglement is always picked, except for picking the same pair twice. For any other value of reuse_exponent, the entanglement of the pair is weighted against how recently a layer has been applied to it.

If the pairwise entanglement between all qubits in the coupling map is zero, this method falls back to the expectation method defined below.

What problem it aims to solve:
The goal here is to use the entanglement structure of the compilation target to inform the adaptive ansatz. This is motivated by the fact that the compiling succeeds by finding a set of gates that "undoes" the target circuit back to the $|00..0\rangle$ state. Since the $|00..0\rangle$ state has no pairwise entanglement, it makes sense that we want to iteratively reduce this.

method="expectation"

What is it:
This is another mode of operation for compiling. When using this method, the ansatz is adaptively constructed by prioritising pairs which have smallest summed $\hat{\sigma}_z$ expectation values ( i.e., the closest to the minimum value of -2)

For the default value of AdaptConfig.expectation_reuse_exponent = 1 the pair with the smallest expectation value is also weighted against how recently a layer has been applied to it.

What problem it aims to solve:
In this case, we aim to use the qubit magnetisation of the target to inform the adaptive ansatz. This has a similar motivation to above, in that each qubit in the $|00..0\rangle$ state has an expectation value of $\langle0|\hat{\sigma}_z|0\rangle = 1$

bad_qubit_pair_memory

What is it:
For the ADAPT-AQC method, if acting on a qubit pair leads to entanglement increasing, it is labelled a "bad pair". After this, for a number of layers corresponding to the bad_qubit_pair_memory, this pair will not be selected.

What problem it aims to solve:
Although pairwise entanglement is used to select a two-qubit pair to act on, the variational ansatz is optimised with respect to the overlap with the $|00..0\rangle$ state. The algorithm thus does not directly minimise entanglement. This means that in certain situations, the optimal angles of an ansatz layer could actually increase the entanglement. This would lead to a high priority for acting on that pair of qubits in the near-future, despite the fact that it was recently optimised. This can lead ADAPT-AQC to get stuck, particularly if there are several unconnected bad qubit pairs. The use of bad_qubit_pair_memory is to make sure that by the time the pair is acted on again, the state of connected qubits has sufficiently changed so that optimising the bad pairs will lead to new optimal angles.

reuse_exponent

What is it:
For the ADAPT-AQC, expectation or general_gradient methods, this controls how much priority should be given to picking qubits not recently acted on. Specifically, given a qubit pair has been last acted on $l$ layers ago, it is given a reuse priority $P_r$ of

$P_r = 1-2^{\frac{-l}{k}},$

where $k$ is the value of reuse_exponent. This is then multiplied with the entanglement measure or gradient (for ADAPT-AQC and general_gradient respectively) to produce the combined priority $P_c$ = $E*P_r$. For expectation mode, the combined priority is calculated differently. Given a pair of qubits, the combined priority is calculated as

$P_c = (2 - \langle Z_1 \rangle + \langle Z_2 \rangle) *P_r$ ,

where $\langle Z_1 \rangle$ ($\langle Z_2 \rangle$) is the $\hat{\sigma}_z$ expectation value of qubit 1 (2).

The qubit pair with the highest combined priority is then picked for the next layer.

This means that for larger $k$, more weighting is given to how recently the pair was used. Conversely, if $k=0$ then no weighting is given.

What problem it aims to solve:
The goal of approximate quantum compiling (AQC) is to produce a circuit that approximately prepares a target state with less depth than the original circuit. The aim of this heuristic is to make ADAPT-AQC depth-aware, so that e.g., the same pairs of qubits are not repeatedly picked if they are only marginally higher entanglement than other pairs that haven't been used. Ultimately, compiling with a larger exponent produces shallower solutions, at the cost of longer compiling times.

reuse_priority_mode

What is it:
The reuse priority system is used to de-prioritise qubits that were recently acted on. When reuse_priority_mode="pair", the priority of a pair of qubits (a, b) is calculated as

$P_r = 1-2^{\frac{-l}{k}},$

where $l$ is the number of layers since that pair had a layer applied to it.

When reuse_priority_mode="qubit", the priority is instead calculated as

$P_r = \mathrm{min}\[1-2^{\frac{-(l_a + 1)}{k}}, 1-2^{\frac{-(l_b + 1)}{k}}\],$

where $l_a$ or $l_b$ is the number of layers since qubit a or b has been acted on respectively. Note that in both cases, the priority of the most recently used qubit pair is set to -1 so that it is never chosen. Additionally, the priority of a pair that has never been used is manually set to 1, so that it receives maximum priority.

What problem it aims to solve:
This heuristic is meant to reflect that, given a pair of qubits (a, b) was recently acted on, the depth of the compiled solution will increase if either a or b are acted on in a new layer. Previously, the "pair" option was the only type of reuse priority in ADAPT-AQC, leading often to solutions where successive layers might act on pairs (a, a+1), (a+1, a+2), (a+2, a+3)... These branch-like structures significantly increase the depth of the solution.

rotosolve_frequency

What is it:
The main optimisation algorithms used by ADAPT-AQC are the Rotoselect and Rotosolve algorithms, more details of which can be found at https://quantum-journal.org/papers/q-2021-01-28-391/. Put simply, the Roto algorithms use sequential optimisation. Given a set of $L$ parameterised gates, the procedure works by fixing $L-1$ of the gates and varying the remaining one to minimise the cost function. This is then repeated for the remaining $L-1$ gates, unfixing one at a time and fixing the others. As the changing of later gates in the layer will affect the loss landscape of the first gates, we repeatedly cycle over all rotation gates until a termination criteria is reached.

rotosolve_frequency defines how often the ansatz is optimised using specifically the Rotosolve algorithm, which only changes the angles of parameterised gates. Specifically, Rotosolve is called after every rotosolve_frequency number of layers have been added. In the context of ADAPT-AQC, it is notable that only rotosolve has the ability to modify previous layers. Specifically, the last AdaptConfig.max_layers_to_modify layers will be optimised using Rotosolve. This makes it an expensive step but often necessary to reach convergence.

NOTE Setting the value rotosolve_frequency=0 will disable rotosolve. This can lead to a large performance improvement when using the matrix product state (MPS) backends, since the guarantee that previous layers won't be modified allows us to cache the state of the system during evolution.

What problem it aims to solve:
The use of Rotosolve reflects the idea that after adding layers, the optimal parameters of previous layers may have changed. Thus it may be more efficient (in terms of final circuit depth) to attempt to re-optimise previous layers than to only add new layers. As such, when not using Rotosolve, generally the solution will be deeper.

AdaptCompiler()

coupling_map

What is it:
A user specified list of tuples, each of which represents a connection between qubits. ADAPT-AQC will be restricted to only adding CNOT gates between pairs which are in this coupling map.

What problem it aims to solve:
Often we want to run the ADAPT-AQC solution on a specific connectivity hardware (e.g., heavy hex). Without this option, it would be possible to convert any solution to a connectivity via SWAP gates, however this can be extremely expensive in terms of number of gates. This option exists to allow ADAPT-AQC to restrict the solution space during compiling.

custom_layer_2q_gate

What is it:
A Qiskit QuantumCircuit to be used as the ansatz layers.

What problem it aims to solve:
ADAPT-AQC uses by default a thinly-dressed CNOT gate (i.e., CNOT surrounded by 4 single qubit rotations). This ansatz is not universal and has not been shown to be objectively better than others, but is a heuristic that originally worked. As such it may be valuable to change what ansatz layer is used.

starting_circuit

What is it:
This is a QuantumCircuit that will be used as a set of initial fixed gates for the compiled solution $\hat{V}$. Because during ADAPT-AQC we are variationally optimising $\hat{V}^\dagger$, the inverse of starting_circuit will be placed at the end of the ansatz.

What problem it aims to solve:
starting_circuit is a useful heuristic when we have some knowledge of the structure of the solution. A good example of what this aims to solve is when a compilation target includes a distinct state preparation step. For example, consider compiling the evolution of a spin-chain starting in the Neel state. The compiling problem is much more efficient if ADAPT-AQC does not need to learn to start by applying an X gate to every other qubit.

local_cost_function

**What is it:**
Normally, ADAPT-AQC uses a a cost $C_\mathrm{LET} = 1- |\langle 0 | V^\dagger U|0\rangle|^2$. The fidelity term, as defined in section IIIB of https://arxiv.org/abs/1908.04416, is generated using the Loschmidt Echo Test (LET), which is the formal name for acting $U|0\rangle$ followed by $V^\dagger$ to get the fidelity. We note that the cost is global with respect to the Hilbert space, since the overlap with the $|00...0\rangle$ state spans the full state vector.

By contrast, when setting local_cost_function=True, ADAPT-AQC will use a cost derived from the Local Loschmidt Echo Test (LLET). Specifically, the cost is defined as

$C_\mathrm{LLET} = 1 - \frac{1}{n} \sum_{j=1}^{n} \langle 0|\rho^j|0\rangle,$

where the second term can be recognised as the sum of the probabilities that each qubit is in the $|0\rangle$ state. Since the cost function does not span the entire Hilbert space, it is described as local.

What problem it aims to solve:
The distinction between global and local cost functions is very important in the context of trainability and barren plateaus (see https://www.nature.com/articles/s41467-021-21728-w), where a global cost function is difficult to train for large numbers of qubits. By contrast the local cost function is trainable. However, we note that $C_\mathrm{LLET} <= C_\mathrm{LET}$, meaning that ADAPT-AQC may not have achieved the desired global fidelity just because the local cost converges.

initial_single_qubit_layer

What is it:
When initial_single_qubit_layer=True, the first layer of the ADAPT-AQC ansatz will be a trainable single-qubit rotation on each qubit. Since this layer will be optimised by Rotoselect, this means that both the angles and the bases of rotations can be modified. Note that since this is the first layer of the ADAPT-AQC ansatz $V^\dagger$, it will end up being the final layer of the returned solution $V$. So we can think of this feature as adding a trainable basis change before measuring the cost function.

What problem it aims to solve:
ADAPT-AQC only applies layers in two-qubit blocks, which means that in certain situations ADAPT-AQC won't be able to find the optimal depth solution. A good example of this is when only a subset of the qubits are entangled. To demonstrate why, for the extreme case of compiling the $n$ qubit $|++..+\rangle$ state, ADAPT-AQC without this feature will need to apply $n$ CNOT gates. By contrast, with initial_single_qubit_layer=True, a solution can be found in depth 1 with no CNOT gates. It is possible the same issue can arise for any target state, if during compiling ADAPT-AQC is left with an intermediate low-entangled state.

AdaptCompiler.compile_in_parts()

What is it:
Compiling in parts, (also called ladder-ADAPT-AQC), is the idea of splitting up a circuit into chunks that we compile sequentially. For example, given a depth 50 circuit $U_ {50}|0\rangle$, we can compile the first 10 depth of gates $U_{0-10}|0\rangle$ to produce $V_{0-10}^\dagger|0\rangle \approx U_{0-10}|0\rangle$. This can then be used to construct a new target $U_{11-20}V_{0-10}^\dagger|0\rangle$.

A particularly good example of this is for time evolution circuits. Here, we start by compiling only the first Trotter step. Once we have a solution $V^\dagger_1$, we append a Trotter step to it and use it as the target state for compiling 2 Trotter steps worth of evolution. We can "ladder" this all the way to the desired number of Trotter steps. This is shown in Fig.7 of https://arxiv.org/abs/2002.04612.

When applied to time dynamics, compiling in parts is also referred to as restarted quantum dynamics (https://arxiv.org/abs/1910.06284), iterative variational Trotter compression (https://arxiv.org/abs/2404.10044) and compressed quantum circuits (https://arxiv.org/abs/2008.10322). There are inevitably more references using the same idea.

What problem it aims to solve:
There are two key benefits to compiling in parts. Firstly, compiling smaller chunks makes each individual optimisation problem easier and less likely to suffer from trainability issues. If we consider the extreme case of compiling one gate at a time, it is clear that compiling only needs to learn the application of one more gate on the starting state.

Secondly, if running approximate quantum compiling on real quantum hardware, compiling in parts allows one to limit the depth of any circuit executed. For example, if the target circuit has a depth of 20, but a device is limited to depth 10 before noise ruins the computation, one can compile in blocks of 5 depth at a time. If successful, $V^\dagger$ will never be deeper than $U$, meaning that the compiling circuit $V^\dagger U |0\rangle$ will never be more than 10 depth.

There are two key drawbacks to compiling in parts. Firstly, we need to solve several sequential compilation problems, which can take longer than compiling the entire circuit at once (if possible). Secondly, the approximation error of each individual solution will multiply every time it is used as the input for the next compiling. Thus the approximation error of the final solution grows exponentially. For example, if we compile 18 sub-circuits one at a time, with a sufficient overlap for each one of $0.99$, the overlap of the final solution would be $0.99^{18} = 0.83$. Thus, one would need to instead use a much higher sufficient overlap of 0.9995 for each sub-ciruit to get the desired final overlap of $0.99$.