Spaces:

ansysresearch
/

quantum

Sleeping

App Files Files Community

quantum / utils /adapt-aqc /docs /running_options_explained.md

harishaseebat92

Fix: Add adapt-aqc as regular files, not submodule

6d8ed8c about 1 month ago

preview code

raw

history blame contribute delete

15.8 kB

	This document is to give a further in-detail explanation about the different ADAPT-AQC options made
	available through `AdaptConfig` and `AdaptCompiler`.

	# AdaptConfig()

	## method="general_gradient"

	This is the core heuristics of ADAPT-AQC, whereby the next two-qubit unitary is placed on the
	pair of qubits which would give the largest cost gradient $\Vert\vec{\nabla} C\Vert$.
	For more details please see Appendix A of https://arxiv.org/abs/2503.09683.

	## method="ISL"

	What is it: \
	When using this method, the ansatz is adaptively constructed
	by prioritising pairs of qubits which have larger pairwise entanglement. This is the original
	heuristic from https://github.com/abhishekagarwal2301/isl which has been kept as the default here
	as it is the only one which supports all backends.

	When
	`AdaptConfig.reuse_exponent = 0`, which is the default setting, the pair with the largest
	entanglement is always picked, except for picking the same pair twice. For any other
	value of `reuse_exponent`, the entanglement of the pair is weighted against how
	recently a layer has been applied to it.

	If the pairwise entanglement between all qubits in the coupling map is zero, this method falls
	back to the `expectation` method defined below.

	What problem it aims to solve: \
	The goal here is to use the entanglement structure of the compilation target to inform the adaptive
	ansatz. This is motivated by the fact that the compiling succeeds by finding a set of gates that
	"undoes" the target circuit back to the $\|00..0\rangle$ state. Since the $\|00..0\rangle$ state has
	no pairwise entanglement, it makes sense that we want to iteratively reduce this.

	## method="expectation"

	What is it: \
	This is another mode of operation for compiling. When using this method, the ansatz is adaptively
	constructed by prioritising pairs which have smallest summed $\hat{\sigma}_z$ expectation values (
	i.e., the closest to the minimum value of -2)

	For the default value of `AdaptConfig.expectation_reuse_exponent = 1` the pair with the smallest
	expectation value is also weighted against how recently a layer has been applied to it.

	What problem it aims to solve: \
	In this case, we aim to use the qubit magnetisation of the target to inform the adaptive ansatz.
	This has a similar motivation to above, in that each qubit in the $\|00..0\rangle$ state has
	an expectation value of $\langle0\|\hat{\sigma}_z\|0\rangle = 1$

	## bad_qubit_pair_memory

	What is it: \
	For the ADAPT-AQC method, if acting on a qubit pair leads to entanglement increasing, it is labelled
	a
	"bad pair". After this, for a number of layers corresponding to the bad_qubit_pair_memory,
	this pair will not be selected.

	What problem it aims to solve: \
	Although pairwise entanglement is used to select a two-qubit pair to act on, the variational
	ansatz is optimised with respect to the overlap with the $\|00..0\rangle$ state. The algorithm thus
	does
	not directly minimise entanglement. This means that in certain situations, the optimal angles of an
	ansatz layer could actually increase the entanglement. This would lead to a high priority for acting
	on that pair of qubits in the near-future, despite the fact that it was recently optimised. This
	can lead ADAPT-AQC to get stuck, particularly if there are several unconnected bad qubit pairs. The
	use
	of `bad_qubit_pair_memory` is to make sure that by the time the pair is acted on again,
	the state of connected qubits has sufficiently changed so that optimising the bad pairs will lead
	to new optimal angles.

	## reuse_exponent

	What is it: \
	For the ADAPT-AQC, expectation or general_gradient methods, this controls how much priority should
	be given to picking qubits not recently
	acted on. Specifically, given a qubit pair has been last acted on $l$ layers ago, it is given a
	reuse priority $P_r$ of

	$$P_r = 1-2^{\frac{-l}{k}},$$

	where $k$ is the value of `reuse_exponent`. This is then multiplied with the
	entanglement measure or gradient (for ADAPT-AQC and general_gradient respectively) to produce the
	combined priority $P_c$ = $E*P_r$. For expectation mode, the combined priority is calculated
	differently. Given a pair of qubits, the combined priority is calculated as

	$$P_c = (2 - \langle Z_1 \rangle + \langle Z_2 \rangle) *P_r$$,

	where $\langle Z_1 \rangle$ ($\langle Z_2 \rangle$) is the $\\hat{\sigma}_z$ expectation value of
	qubit
	1 (2).

	The qubit pair with the highest combined priority is then picked for the next layer.

	This means that for larger $k$, more weighting is given to how recently the pair was used.
	Conversely, if $k=0$ then no
	weighting is given.

	What problem it aims to solve: \
	The goal of approximate quantum compiling (AQC) is to produce a circuit that approximately prepares
	a target state with less depth than the original circuit. The aim of this heuristic is to make
	ADAPT-AQC depth-aware, so that e.g., the same pairs of qubits are not repeatedly picked if they are
	only
	marginally higher entanglement than other pairs that haven't been used. Ultimately, compiling
	with a larger exponent produces shallower solutions, at the cost of longer compiling times.

	## reuse_priority_mode

	What is it: \
	The reuse priority system is used to de-prioritise qubits that were recently acted on. When
	`reuse_priority_mode="pair"`, the priority of a pair of qubits (a, b) is calculated as

	$$P_r = 1-2^{\frac{-l}{k}},$$

	where $l$ is the number of layers since that pair had a layer applied to it.

	When `reuse_priority_mode="qubit"`, the priority is instead calculated as

	$$P_r = \mathrm{min}\[1-2^{\frac{-(l_a + 1)}{k}}, 1-2^{\frac{-(l_b + 1)}{k}}\],$$

	where $l_a$ or $l_b$ is the number of layers since qubit a or b has been acted on respectively. Note
	that in both cases, the priority of the most recently used qubit pair is set to -1 so that it is
	never chosen. Additionally, the priority of a pair that has never been used is manually set to 1, so
	that it receives maximum priority.

	What problem it aims to solve: \
	This heuristic is meant to reflect that, given a pair of qubits (a, b) was recently acted on, the
	depth of the compiled solution will increase if _either_ a or b are acted on in a new layer.
	Previously, the "pair" option was the only type of reuse priority in ADAPT-AQC, leading often to
	solutions where successive layers might act on pairs (a, a+1), (a+1, a+2), (a+2, a+3)... These
	branch-like structures significantly increase the depth of the solution.

	## rotosolve_frequency

	What is it: \
	The main optimisation algorithms used by ADAPT-AQC are the Rotoselect and Rotosolve algorithms, more
	details of which can be found at https://quantum-journal.org/papers/q-2021-01-28-391/. Put simply,
	the Roto algorithms use sequential optimisation. Given a set of $L$ parameterised gates, the
	procedure
	works by fixing $L-1$ of the gates and varying the remaining one to minimise the cost function.
	This is then repeated for the remaining $L-1$ gates, unfixing one at a time and fixing the others.
	As the changing of later gates in the layer will affect the loss landscape of the first gates, we
	repeatedly cycle over all rotation gates until a termination criteria is reached.

	`rotosolve_frequency` defines how often the ansatz is optimised using specifically the Rotosolve
	algorithm, which only changes the angles of parameterised gates. Specifically, Rotosolve is called
	after every `rotosolve_frequency` number of layers have been added. In the context of ADAPT-AQC, it
	is
	notable that _only rotosolve_ has the ability to modify previous layers. Specifically, the last
	`AdaptConfig.max_layers_to_modify` layers will be optimised using Rotosolve. This makes it an
	expensive step but often necessary to reach convergence.

	NOTE Setting the value `rotosolve_frequency=0` will disable rotosolve. This can lead to a large
	performance improvement when using the matrix product state (MPS) backends, since the guarantee
	that previous layers won't be modified allows us to cache the state of the system during evolution.

	What problem it aims to solve: \
	The use of Rotosolve reflects the idea that after adding layers, the optimal parameters of
	previous layers may have changed. Thus it may be more efficient (in terms of final circuit depth)
	to attempt to re-optimise previous layers than to only add new layers. As such, when not using
	Rotosolve, generally the solution will be deeper.

	# AdaptCompiler()

	## coupling_map

	What is it: \
	A user specified list of tuples, each of which represents a connection between qubits. ADAPT-AQC
	will be
	restricted to only adding CNOT gates between pairs which are in this coupling map.

	What problem it aims to solve: \
	Often we want to run the ADAPT-AQC solution on a specific connectivity hardware (e.g., heavy hex).
	Without
	this option, it would be possible to convert any solution to a connectivity via SWAP gates, however
	this can be extremely expensive in terms of number of gates. This option exists to allow ADAPT-AQC
	to
	restrict the solution space during compiling.

	## custom_layer_2q_gate

	What is it: \
	A Qiskit `QuantumCircuit` to be used as the ansatz layers.

	What problem it aims to solve: \
	ADAPT-AQC uses by default a thinly-dressed CNOT gate (i.e., CNOT surrounded by 4 single qubit
	rotations).
	This ansatz is not universal and has not been shown to be objectively better than others, but is a
	heuristic that originally worked. As such it may be valuable to change what ansatz layer is used.

	## starting_circuit

	What is it: \
	This is a `QuantumCircuit` that will be used as a set of initial fixed gates for the compiled
	solution $\hat{V}$. Because during ADAPT-AQC we are variationally optimising $\hat{V}^\dagger$, the
	inverse of `starting_circuit` will be placed at the end of the ansatz.

	What problem it aims to solve: \
	`starting_circuit` is a useful heuristic when we have some knowledge of the structure of the
	solution. A good example of what this aims to solve is when a compilation target includes
	a distinct state preparation step. For example, consider compiling the evolution of a spin-chain
	starting in the Neel state. The compiling problem is much more efficient if ADAPT-AQC does not need
	to
	learn to start by applying an X gate to every other qubit.

	## local_cost_function

	What is it:\
	Normally, ADAPT-AQC uses a a cost $C_\mathrm{LET} = 1- \|\langle 0 \| V^\dagger U\|0\rangle\|^2$. The
	fidelity
	term,
	as defined in section IIIB of https://arxiv.org/abs/1908.04416, is generated using the Loschmidt
	Echo Test (LET), which is the formal name for acting $U\|0\rangle$ followed by $V^\dagger$ to get
	the fidelity. We note that the cost is global with respect to the Hilbert space, since the overlap
	with the $\|00...0\rangle$ state spans the full state vector.

	By contrast, when setting `local_cost_function=True`, ADAPT-AQC will use a cost derived from the
	Local
	Loschmidt Echo Test (LLET). Specifically, the cost is defined as

	$$C_\mathrm{LLET} = 1 - \frac{1}{n} \sum_{j=1}^{n} \langle 0\|\rho^j\|0\rangle,$$

	where the second term can be
	recognised as the sum of the probabilities that each qubit is in the $\|0\rangle$ state. Since the
	cost function does not span the entire Hilbert space, it is described as local.

	What problem it aims to solve: \
	The distinction between global and local cost functions is very important in the context of
	trainability and barren plateaus (see https://www.nature.com/articles/s41467-021-21728-w), where
	a global cost function is difficult to train for large numbers of qubits. By contrast the local
	cost function is trainable. However, we note that $C_\mathrm{LLET} <= C_\mathrm{LET}$, meaning
	that ADAPT-AQC may not have achieved the desired global fidelity just because the local cost
	converges.

	## initial_single_qubit_layer

	What is it: \
	When `initial_single_qubit_layer=True`, the first layer of the ADAPT-AQC ansatz will be
	a trainable single-qubit rotation on each qubit. Since this layer will be optimised by Rotoselect,
	this means that both the angles and the bases of rotations can be modified. Note that since this
	is the first layer of the ADAPT-AQC ansatz $V^\dagger$, it will end up being the final layer of the
	returned solution $V$. So we can think of this feature as adding a trainable basis change before
	measuring the cost function.

	What problem it aims to solve: \
	ADAPT-AQC only applies layers in two-qubit blocks, which means that in certain situations ADAPT-AQC
	won't be
	able to find the optimal depth solution. A good example of this is when only a subset of the
	qubits are entangled. To demonstrate why, for the extreme case of compiling the $n$ qubit
	$\|++..+\rangle$ state, ADAPT-AQC without this feature will need to apply $n$ CNOT gates. By
	contrast,
	with `initial_single_qubit_layer=True`, a solution can be found in depth 1 with no CNOT gates.
	It is possible the same issue can arise for any target state, if during compiling ADAPT-AQC is left
	with an intermediate low-entangled state.

	## AdaptCompiler.compile_in_parts()

	What is it: \
	Compiling in parts, (also called ladder-ADAPT-AQC), is the idea of splitting up a
	circuit into chunks that we compile sequentially. For example, given a depth 50 circuit $U_
	{50}\|0\rangle$, we can compile the first 10 depth of gates $U_{0-10}\|0\rangle$ to produce
	$V_{0-10}^\dagger\|0\rangle \approx U_{0-10}\|0\rangle$. This can then be used to construct a new
	target $U_{11-20}V_{0-10}^\dagger\|0\rangle$.

	A particularly good example of this is for time evolution circuits. Here, we start by compiling only
	the first Trotter step. Once we have a solution $V^\dagger_1$, we append a Trotter step to it and
	use it as the target state for compiling 2 Trotter steps worth of evolution. We can "ladder" this
	all the way to the desired number of Trotter steps. This is shown in Fig.7
	of https://arxiv.org/abs/2002.04612.

	When applied to time dynamics, compiling in parts is also referred to as restarted quantum dynamics
	(https://arxiv.org/abs/1910.06284), iterative variational Trotter
	compression (https://arxiv.org/abs/2404.10044) and compressed quantum
	circuits (https://arxiv.org/abs/2008.10322). There are inevitably more references using the same
	idea.

	What problem it aims to solve: \
	There are two key benefits to compiling in parts. Firstly, compiling smaller chunks makes each
	individual optimisation problem easier and less likely to suffer from trainability issues. If
	we consider the extreme case of compiling one gate at a time, it is clear that compiling only
	needs to learn the application of one more gate on the starting state.

	Secondly, if running approximate quantum compiling on real quantum hardware, compiling in parts
	allows one to limit the depth of any circuit executed. For example, if the target circuit has a
	depth of 20, but a device is limited to depth 10 before noise ruins the computation, one can compile
	in blocks of 5 depth at a time. If successful, $V^\dagger$ will never be deeper than $U$, meaning
	that the compiling circuit $V^\dagger U \|0\rangle$ will never be more than 10 depth.

	There are two key drawbacks to compiling in parts. Firstly, we need to solve several sequential
	compilation problems, which can take longer than compiling the entire circuit at once (if possible).
	Secondly, the approximation error of each individual solution will multiply every time it is used
	as the input for the next compiling. Thus the approximation error of the final solution grows
	exponentially. For example, if we compile 18 sub-circuits one at a time, with a sufficient
	overlap for each one of $0.99$, the overlap of the final solution would be $0.99^{18} = 0.83$. Thus,
	one
	would need to instead use a much higher sufficient overlap of 0.9995 for each sub-ciruit to get
	the desired final overlap of $0.99$.