80.8 kB

Title: Abstract

URL Source: https://arxiv.org/html/2307.01017

Published Time: Thu, 10 Apr 2025 01:00:43 GMT

Markdown Content: Scalable quantum neural networks by few quantum resources

Davide Pastorello⋄,† and Enrico Blanzieri⋆,†

⋄ Department of Mathematics, Alma Mater Studiorum - University of Bologna

piazza di Porta San Donato 5, 40126 Bologna, Italy

davide.pastorello3@unibo.it

⋆ Department of Information Engineering and Computer Science, University of Trento,

via Sommarive 9, 38123 Povo (Trento), Italy

enrico.blanzieri@unitn.it

† Trento Institute for Fundamental Physics and Applications,

via Sommarive 14, 38123 Povo (Trento), Italy

This paper focuses on the construction of a general parametric model that can be implemented executing multiple swap tests over few qubits and applying a suitable measurement protocol. The model turns out to be equivalent to a two-layer feedforward neural network which can be realized combining small quantum modules. The advantages and the perspectives of the proposed quantum method are discussed.

Keywords: Neural networks; quantum machine learning; scalable quantum computing.

1 Introduction

The application of quantum computation to machine learning tasks offers some interesting solutions characterized by a quantum advantage with respect to the classical counterparts in terms of time and space complexity, expressive power, generalization capability, at least on a theoretical level [1, 2, 3]. Moreover, quantum machine learning (QML) seems to be a promising path to explore in deep the possibilities offered by quantum machines and to exploit existing and near-term quantum computers for tackling real-world problems. In this respect, the research activity aimed to the development of novel QML algorithms must face the issue of the lack of large-scaled, fault-tolerant, universal quantum computers. In order to promote QML as an operative technology we should provide solutions that are not too demanding in terms of quantum resources and, in the meanwhile, they supply quantum advantages w.r.t. classical information processing. The present paper focuses on a solution which represents a trade-off between the definition of a quantum model, specifically a quantum neural network, and the availability of few quantum resources as a realistic assumption.

In this work, we assume the availability of a very specific quantum hardware that is able to perform the swap test 1 1 1 The swap test is a simple but useful technique to evaluate the similarity between two unknown pure states [4]. It is briefly illustrated in the next section. over two k 𝑘 k italic_k-qubit registers (k≥1 𝑘 1 k\geq 1 italic_k ≥ 1). This assumption is motivated by several proposals of a physical implementation of the swap test then it turns out to be rather realistic for few qubits [5, 6]. Then, we assume to combine many copies of the quantum hardware, as modules, in order to obtain a larger structure for data processing. In particular, we argue that a combined execution of many independent swap tests, performing only local measurements, allows to create a two-layer feedforward neural network presenting some quantum advantages in terms of space complexity with the possibility to input quantum data. The main result is a general construction of a meaningful quantum model based on very few quantum resources whose scalability is allowed by the requirement of quantum coherence over a small number of qubits because a major role is played by measurement processes.

The swap test is a well-known tool for developing QML techniques, so there are some proposals of quantum neural networks based on it. For instance, there are quantum neurons where the swap test is coupled to the quantum phase estimation algorithm for realizing quantum neural networks with quantum input and quantum output [7, 8]. Our approach is different, we assume only the capability of performing the swap test over few qubits for constructing neural networks where the input is a quantum state and the output is always classical. Our goal is the definition of a quantum model which can be realized assuming the availability of few quantum resources.

The paper is structured as follows: in section 2 we recall some basic notions about two-layer feedforward neural networks and the definition of swap test as the building block for the proposed modular architecture. In section 3, we define the parametric model obtained combining small modules acting on few qubits, showing its equivalence to a neural network after a suitable measurement protocol is performed. In section 4, we discuss the impact of the obtained results. In the final section, we draw some concluding remarks highlighting the perspectives of the proposed modular approach.

2 Background

In this section we introduce some basic notions that are crucial for the definition of the model proposed and discussed in the paper. In particular, we define the structure of the neural network we consider in the main part of the work and the necessary quantum computing fundamentals.

Definition 1

Let X 𝑋 X italic_X and Y 𝑌 Y italic_Y be non-empty sets respectively called input domain and output domain. A (deterministic) predictive model is a function

y=f⁢(x;θ)x∈X,y∈Y,formulae-sequence 𝑦 𝑓 𝑥 𝜃 formulae-sequence 𝑥 𝑋 𝑦 𝑌 y=f(x;\theta)\qquad x\in X,,,y\in Y,italic_y = italic_f ( italic_x ; italic_θ ) italic_x ∈ italic_X , italic_y ∈ italic_Y ,

where θ=(θ 1,…,θ d)𝜃 subscript 𝜃 1…subscript 𝜃 𝑑\theta=(\theta_{1},...,\theta_{d})italic_θ = ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) is a set of real parameters.

In supervised learning, given a training set {(x 1,y 1),⋯,(x m,y m)}subscript 𝑥 1 subscript 𝑦 1⋯subscript 𝑥 𝑚 subscript 𝑦 𝑚{(x_{1},y_{1}),\cdots,(x_{m},y_{m})}{ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ⋯ , ( italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) } of pairs input-output the general task is finding the parameters θ 𝜃\theta italic_θ such that the model assigns the right output to any unlabelled input. The parameters are typically determined optimizing a loss function like ℒ⁢(θ)=∑i=1 m d⁢(y i,f⁢(x i,θ))ℒ 𝜃 superscript subscript 𝑖 1 𝑚 𝑑 subscript 𝑦 𝑖 𝑓 subscript 𝑥 𝑖 𝜃\mathcal{L}(\theta)=\sum_{i=1}^{m}d(y_{i},f(x_{i},\theta))caligraphic_L ( italic_θ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_d ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_θ ) ) where d 𝑑 d italic_d is a metric defined over Y 𝑌 Y italic_Y. A remarkable example of predictive model, where X=ℝ n 𝑋 superscript ℝ 𝑛 X={\mathbb{R}}^{n}italic_X = blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and Y=ℝ 𝑌 ℝ Y={\mathbb{R}}italic_Y = blackboard_R, is the following two-layer neural network with h ℎ h italic_h hidden nodes:

f⁢(x;p,W)=∑i=1 h p i⁢φ⁢(x⋅w i),𝑓 x p 𝑊 superscript subscript 𝑖 1 ℎ subscript 𝑝 𝑖 𝜑⋅x subscript w 𝑖\displaystyle f({\textbf{x}};{\textbf{p}},W)=\sum_{i=1}^{h}p_{i}\varphi({% \textbf{x}}\cdot\textbf{w}_{i}),italic_f ( x ; p , italic_W ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_φ ( x ⋅ w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ,(1)

where w i∈ℝ n subscript w 𝑖 superscript ℝ 𝑛\textbf{w}{i}\in{\mathbb{R}}^{n}w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the i 𝑖 i italic_i th row of the n×h 𝑛 ℎ n\times h italic_n × italic_h weight matrix W 𝑊 W italic_W of the first layer, p i subscript 𝑝 𝑖 p{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i 𝑖 i italic_i th coefficient of the second layer, and φ:ℝ→ℝ:𝜑→ℝ ℝ\varphi:{\mathbb{R}}\rightarrow{\mathbb{R}}italic_φ : blackboard_R → blackboard_R is a continuous non-linear function. Despite its simplicity, this model is relevant because it is a universal approximator if φ 𝜑\varphi italic_φ is non-polynomial, in the sense that it can approximate, up to finite precision, any Borel function K→ℝ→𝐾 ℝ K\rightarrow{\mathbb{R}}italic_K → blackboard_R, with K⊂ℝ n 𝐾 superscript ℝ 𝑛 K\subset{\mathbb{R}}^{n}italic_K ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT compact, provided a sufficient number of hidden nodes is available [9, 10, 11].

Let us focus on a quadratic activation function for a two-layer network that is not very used in practice however it has been widely studied [12, 13, 14, 15] proving, in particular, that networks with quadratic activation are as expressive as networks with threshold activation and can be learned in polynomial time [12]. The network with this activation fails the universality, however the quadratic activation presents the same convexity of the rectified linear unit (ReLU) but with a wider activating zone, so the loss curves of networks using these two activations converge very similarly [15]. In addition to the choice of a quadratic activation, let us consider the cosine similarity instead of the dot product as pre-activation:

cos⁡(x,w):=x⋅w‖x‖⁢‖w‖.assign x w⋅x w norm x norm w\displaystyle\cos({\textbf{x}},\textbf{w}):=\frac{{\textbf{x}}\cdot\textbf{w}}% {\parallel{\textbf{x}}\parallel,,\parallel\textbf{w}\parallel}.roman_cos ( x , w ) := divide start_ARG x ⋅ w end_ARG start_ARG ∥ x ∥ ∥ w ∥ end_ARG .(2)

The cosine pre-activation is a normalization procedure adopted to decrease the variance of the neuron that can be large as the dot product is unbounded. A large variance can make the model sensitive to the change of input distribution, thus resulting in poor generalization. Experiments show that cosine pre-activation achieves better performances, in terms of test error in classification, compared to batch, weight, and layer normalization [16]. The aim of the paper is showing that a simple quantum processing, combined in multiple copies, can be used to implement a varying-size two-layer network with cosine pre-activation and quadratic activation.

The main assumption of the work is that few quantum resources are available in the sense that we consider a quantum hardware characterized by a small number of qubits on which we can act with few quantum gates being far from the availability of a large-scaled, fault-tolerant, universal quantum computer. The central object is a well-known tool of QML, the swap test, that we assume to have implemented into a specific-purpose quantum hardware, characterized by an arbitrarily small number of qubits, that we call module. The questions we address are: What useful can we do if we are able to build such a module with a small number of qubits? Can we construct a neural network stacking such modules with some interesting properties?

As a postulate of quantum mechanics, any quantum system is described in a Hilbert space. We consider only the finite-dimensional case where a Hilbert space 𝖧 𝖧{\mathsf{H}}sansserif_H is nothing but a complex vector space equipped with an inner product_⟨|⟩:𝖧×𝖧→ℂ\langle,,,|,,,\rangle:{\mathsf{H}}\times{\mathsf{H}}\rightarrow{\mathbb{% C}}⟨ | ⟩ : sansserif_H × sansserif_H → blackboard_C. A quantum system is described in the Hilbert space 𝖧 𝖧{\mathsf{H}}sansserif_H in the sense that its physical states are in bijective correspondence with the projective rays in 𝖧 𝖧{\mathsf{H}}sansserif_H. Then a (pure) quantum state can be identified to a normalized vector |ψ⟩∈𝖧 ket 𝜓 𝖧{\left|{\psi}\right\rangle}\in{\mathsf{H}}| italic_ψ ⟩ ∈ sansserif_H up to a multiplicative phase factor. In other words, quantum states are described by one-dimensional orthogonal projectors of the form ρ ψ=|ψ⟩⁢⟨ψ|subscript 𝜌 𝜓 ket 𝜓 bra 𝜓\rho{\psi}={\left|{\psi}\right\rangle}{\left\langle{\psi}\right|}italic_ρ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT = | italic_ψ ⟩ ⟨ italic_ψ | where ⟨ψ|bra 𝜓{\left\langle{\psi}\right|}⟨ italic_ψ | is an element of the dual of 𝖧 𝖧{\mathsf{H}}sansserif_H.

Definition 2

Let 𝖧 𝖧{\mathsf{H}}sansserif_H be a (fnite-dimensional) Hilbert space and I 𝐼 I italic_I a finite set. A measurement process is a collection of linear operators ℳ={E α}α∈I ℳ subscript subscript 𝐸 𝛼 𝛼 𝐼\mathcal{M}={E_{\alpha}}{\alpha\in I}caligraphic_M = { italic_E start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_α ∈ italic_I end_POSTSUBSCRIPT such that E α≥0 subscript 𝐸 𝛼 0 E{\alpha}\geq 0 italic_E start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ≥ 0 for each α∈I 𝛼 𝐼\alpha\in I italic_α ∈ italic_I and ∑α E α=𝕀 subscript 𝛼 subscript 𝐸 𝛼 𝕀\sum_{\alpha}E_{\alpha}={\mathbb{I}}∑ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = blackboard_I, where 𝕀 𝕀{\mathbb{I}}blackboard_I is the identity operator on 𝖧 𝖧{\mathsf{H}}sansserif_H.

Given an orthonormal basis {|i⟩}i=1,…,n subscript ket 𝑖 𝑖 1…𝑛{{\left|{i}\right\rangle}}{i=1,...,n}{ | italic_i ⟩ } start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT of 𝖧 𝖧{\mathsf{H}}sansserif_H, the measurement in the basis{|i⟩}i=1,…,n subscript ket 𝑖 𝑖 1…𝑛{{\left|{i}\right\rangle}}{i=1,...,n}{ | italic_i ⟩ } start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT is the measurement process defined by ℳ={|i⟩⁢⟨i|}i=1,…,n ℳ subscript ket 𝑖 bra 𝑖 𝑖 1…𝑛\mathcal{M}={{\left|{i}\right\rangle}{\left\langle{i}\right|}}_{i=1,...,n}caligraphic_M = { | italic_i ⟩ ⟨ italic_i | } start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT.

The probability of obtaining the outcome α∈I 𝛼 𝐼\alpha\in I italic_α ∈ italic_I by the measurement ℳ={E α}α∈I ℳ subscript subscript 𝐸 𝛼 𝛼 𝐼\mathcal{M}={E_{\alpha}}{\alpha\in I}caligraphic_M = { italic_E start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_α ∈ italic_I end_POSTSUBSCRIPT on the system in the state |ψ⟩ket 𝜓{\left|{\psi}\right\rangle}| italic_ψ ⟩ is ℙ ψ⁢(α)=⟨ψ|E α⁢ψ⟩subscript ℙ 𝜓 𝛼 inner-product 𝜓 subscript 𝐸 𝛼 𝜓\mathbb{P}{\psi}(\alpha)=\langle\psi|E_{\alpha}\psi\rangle blackboard_P start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_α ) = ⟨ italic_ψ | italic_E start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_ψ ⟩. If ℳ ℳ\mathcal{M}caligraphic_M is a measurement in the basis {|i⟩}i=1,…,n subscript ket 𝑖 𝑖 1…𝑛{{\left|{i}\right\rangle}}{i=1,...,n}{ | italic_i ⟩ } start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT then the probability reduces to ℙ ψ⁢(i)=|⟨ψ|i⟩|2 subscript ℙ 𝜓 𝑖 superscript inner-product 𝜓 𝑖 2\mathbb{P}{\psi}(i)=|\langle\psi|i\rangle|^{2}blackboard_P start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_i ) = | ⟨ italic_ψ | italic_i ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. A qubit is any quantum system described in the Hilbert space ℂ 2 superscript ℂ 2{\mathbb{C}}^{2}blackboard_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, denoting the standard basis of ℂ 2 superscript ℂ 2{\mathbb{C}}^{2}blackboard_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by {|0⟩,|1⟩}ket 0 ket 1{{\left|{0}\right\rangle},{\left|{1}\right\rangle}}{ | 0 ⟩ , | 1 ⟩ } we have the computational measurement on a single qubit ℳ={|0⟩⁢⟨0|,|1⟩⁢⟨1|}ℳ ket 0 bra 0 ket 1 bra 1\mathcal{M}={{\left|{0}\right\rangle}{\left\langle{0}\right|},{\left|{1}% \right\rangle}{\left\langle{1}\right|}}caligraphic_M = { | 0 ⟩ ⟨ 0 | , | 1 ⟩ ⟨ 1 | }.

Another postulate of quantum mechanics reads that, given two quantum systems S 1 subscript 𝑆 1 S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and S 2 subscript 𝑆 2 S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT described in the Hilbert spaces 𝖧 1 subscript 𝖧 1{\mathsf{H}}{1}sansserif_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝖧 2 subscript 𝖧 2{\mathsf{H}}{2}sansserif_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT respectively, the Hilbert space of the composite system S 1+S 2 subscript 𝑆 1 subscript 𝑆 2 S_{1}+S_{2}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is given by the tensor product Hilbert space 𝖧 1⊗𝖧 2 tensor-product subscript 𝖧 1 subscript 𝖧 2{\mathsf{H}}{1}\otimes{\mathsf{H}}{2}sansserif_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ sansserif_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Therefore there are states of the product form |ψ 1⟩⊗|ψ 2⟩∈𝖧 1⊗𝖧 2 tensor-product ket subscript 𝜓 1 ket subscript 𝜓 2 tensor-product subscript 𝖧 1 subscript 𝖧 2{\left|{\psi_{1}}\right\rangle}\otimes{\left|{\psi_{2}}\right\rangle}\in{% \mathsf{H}}{1}\otimes{\mathsf{H}}{2}| italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ ⊗ | italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ ∈ sansserif_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ sansserif_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (in contract notation: |ψ 1⁢ψ 2⟩ket subscript 𝜓 1 subscript 𝜓 2{\left|{\psi_{1}\psi_{2}}\right\rangle}| italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩) called separable states and states that cannot be factorized called entangled states. The latter describe a kind of non-local correlations without a counterpart in classical physics which are the key point of the most relevant advantages in quantum information processing.

Encoding classical data into quantum states is a crucial issue in quantum computing and specifically quantum machine learning. One of the most popular quantum encoding is the so-called amplitude encoding. Assume we have a data instance represented by a complex vector x∈ℂ n x superscript ℂ 𝑛{\textbf{x}}\in{\mathbb{C}}^{n}x ∈ blackboard_C start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT then it can be encoded into the amplitudes of a quantum state:

|x⟩=1‖x‖⁢∑i=1 n x i⁢|i⟩,ket x 1 norm x superscript subscript 𝑖 1 𝑛 subscript 𝑥 𝑖 ket 𝑖\displaystyle{\left|{{\textbf{x}}}\right\rangle}=\frac{1}{\parallel{\textbf{x}% }\parallel}\sum_{i=1}^{n}x_{i}{\left|{i}\right\rangle},| x ⟩ = divide start_ARG 1 end_ARG start_ARG ∥ x ∥ end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ⟩ ,(3)

where {|i⟩}i=1,…,n subscript ket 𝑖 𝑖 1…𝑛{{\left|{i}\right\rangle}}{i=1,...,n}{ | italic_i ⟩ } start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT is a fixed orthonormal basis, called computational basis, of the Hilbert space of the considered quantum physical system. If the quantum system is made up by k 𝑘 k italic_k qubits, say a k 𝑘 k italic_k-qubit register, with Hilbert space 𝖧=(ℂ 2)⊗k 𝖧 superscript superscript ℂ 2 tensor-product absent 𝑘{\mathsf{H}}=({\mathbb{C}}^{2})^{\otimes k}sansserif_H = ( blackboard_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ italic_k end_POSTSUPERSCRIPT then for storing x∈ℂ n x superscript ℂ 𝑛{\textbf{x}}\in{\mathbb{C}}^{n}x ∈ blackboard_C start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT we need k=⌈log⁡n⌉𝑘 𝑛 k=\lceil\log n\rceil italic_k = ⌈ roman_log italic_n ⌉ qubits and the commonly used computational basis is {|d⟩:d∈{0,1}k}:ket 𝑑 𝑑 superscript 0 1 𝑘{{\left|{d}\right\rangle}:d\in{0,1}^{k}}{ | italic_d ⟩ : italic_d ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }. Therefore, amplitude encoding exploits the exponential storing capacity of a quantum memory, but it does not allow the direct retrieval of the stored data. Indeed, the amplitudes cannot be observed, and only the probabilities |x i|2/‖x‖2 superscript subscript 𝑥 𝑖 2 superscript norm x 2\lvert x{i}\rvert^{2}/\parallel!{\textbf{x}}!\parallel^{2}| italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ∥ x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT can be estimated measuring the state |x⟩ket x{\left|{{\textbf{x}}}\right\rangle}| x ⟩ in the computational basis. An immediate consequence of the definition is that the inner product of two encoding states corresponds to the cosine similarity of the encoded vectors:

⟨x|y⟩=cos⁡(x,y)∀x,y∈ℂ n.formulae-sequence inner-product x y x y for-all x y superscript ℂ 𝑛\displaystyle\langle{\textbf{x}}|\textbf{y}\rangle=\cos({\textbf{x}},\textbf{y% })\qquad\forall{\textbf{x}},\textbf{y}\in{\mathbb{C}}^{n}.⟨ x | y ⟩ = roman_cos ( x , y ) ∀ x , y ∈ blackboard_C start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .(4)

Within the quantum circuit model, quantum states are processed by quantum gates. Let us consider the Hadamard gate, that is a unitary operator on ℂ 2 superscript ℂ 2{\mathbb{C}}^{2}blackboard_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, defined in terms of the computational basis by:

H⁢|a⟩=|0⟩+(−1)a⁢|1⟩2 a∈{0,1}.formulae-sequence 𝐻 ket 𝑎 ket 0 superscript 1 𝑎 ket 1 2 𝑎 0 1\displaystyle H{\left|{a}\right\rangle}=\frac{{\left|{0}\right\rangle}+(-1)^{a% }{\left|{1}\right\rangle}}{\sqrt{2}}\qquad a\in{0,1}.italic_H | italic_a ⟩ = divide start_ARG | 0 ⟩ + ( - 1 ) start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT | 1 ⟩ end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG italic_a ∈ { 0 , 1 } .(5)

The controlled NOT (CNOT) gate is a 2-qubit gate, that is a unitary operator on ℂ 2⊗ℂ 2 tensor-product superscript ℂ 2 superscript ℂ 2{\mathbb{C}}^{2}\otimes{\mathbb{C}}^{2}blackboard_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⊗ blackboard_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, defined by:

U C⁢N⁢O⁢T⁢|a⁢b⟩:=|a⁢(a⊕b)⟩a,b∈{0,1},formulae-sequence assign subscript 𝑈 𝐶 𝑁 𝑂 𝑇 ket 𝑎 𝑏 ket 𝑎 direct-sum 𝑎 𝑏 𝑎 𝑏 0 1\displaystyle U_{CNOT},{\left|{ab}\right\rangle}:={\left|{a,(a\oplus b)}% \right\rangle}\qquad a,b\in{0,1},italic_U start_POSTSUBSCRIPT italic_C italic_N italic_O italic_T end_POSTSUBSCRIPT | italic_a italic_b ⟩ := | italic_a ( italic_a ⊕ italic_b ) ⟩ italic_a , italic_b ∈ { 0 , 1 } ,(6)

where ⊕direct-sum\oplus⊕ is the sum modulo 2. Its circuital symbol is:

The swap gate is a 2-qubit gate which exchanges the input qubits, its circuital definition in terms of CNOT gates is:

The controlled swap gate is called Fredkin gate, its action can be expressed in the following form in the computational basis of three qubits, where the first qubit is the control:

𝖥⁢|a⁢b⁢c⟩=(1−a)⁢|a⁢b⁢c⟩+a⁢|a⁢c⁢b⟩a,b,c∈{0,1}.formulae-sequence 𝖥 ket 𝑎 𝑏 𝑐 1 𝑎 ket 𝑎 𝑏 𝑐 𝑎 ket 𝑎 𝑐 𝑏 𝑎 𝑏 𝑐 0 1\displaystyle\mathsf{F}{\left|{abc}\right\rangle}=(1-a){\left|{abc}\right% \rangle}+a{\left|{acb}\right\rangle}\qquad a,b,c\in{0,1}.sansserif_F | italic_a italic_b italic_c ⟩ = ( 1 - italic_a ) | italic_a italic_b italic_c ⟩ + italic_a | italic_a italic_c italic_b ⟩ italic_a , italic_b , italic_c ∈ { 0 , 1 } .(7)

The swap test is a very simple procedure to estimate the probability |⟨ψ|φ⟩|2 superscript inner-product 𝜓 𝜑 2|\langle\psi|\varphi\rangle|^{2}| ⟨ italic_ψ | italic_φ ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, given two unknown pure states |ψ⟩ket 𝜓{\left|{\psi}\right\rangle}| italic_ψ ⟩ and |φ⟩ket 𝜑{\left|{\varphi}\right\rangle}| italic_φ ⟩[4]. The procedure requires two applications of the Hadamard gate and a controlled swap operation between two multiple qubit registers generalizing the Fredkin gate. It is straightforward to show that the controlled swap of two k 𝑘 k italic_k-qubit registers can be implemented with k 𝑘 k italic_k Fredkin gates controlled by the same qubit. Consider two k 𝑘 k italic_k-qubit registers initialized in |ψ⟩ket 𝜓{\left|{\psi}\right\rangle}| italic_ψ ⟩ and |φ⟩ket 𝜑{\left|{\varphi}\right\rangle}| italic_φ ⟩ and a control qubit prepared in |0⟩ket 0{\left|{0}\right\rangle}| 0 ⟩. Assume to act with the following circuit on the 2⁢k+1 2 𝑘 1 2k+1 2 italic_k + 1 qubits:

(14)

after a simple and well-known calculation we have that the probability of obtaining the outcome 0 0, measuring the control qubit in the computational basis, is:

ℙ⁢(0)=1 2⁢(1−|⟨ψ|φ⟩|2).ℙ 0 1 2 1 superscript inner-product 𝜓 𝜑 2\displaystyle\mathbb{P}(0)=\frac{1}{2}(1-|\langle\psi|\varphi\rangle|^{2}).blackboard_P ( 0 ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - | ⟨ italic_ψ | italic_φ ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .(15)

Since the probability is estimated as a relative frequency in a Bernoulli trial, the number of runs of the circuit (14) required to obtain ℙ⁢(0)ℙ 0\mathbb{P}(0)blackboard_P ( 0 ) within an error ϵ italic-ϵ\epsilon italic_ϵ is 𝒪⁢(ϵ−2)𝒪 superscript italic-ϵ 2\mathcal{O}(\epsilon^{-2})caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ).

From the viewpoint of practical realization, several methods have been proposed to implement a linearly optical swap test [17, 18, 19] and techniques for implementing a swap test based on nonlinear optics have been proposed as well [20, 21]. In this work we realistically assume the availability of a swap test over few qubits, simply depicted by the following process diagram that we introduce for convenience of notation below:

(16)

where |ψ⟩ket 𝜓{\left|{\psi}\right\rangle}| italic_ψ ⟩ and |φ⟩ket 𝜑{\left|{\varphi}\right\rangle}| italic_φ ⟩ are k 𝑘 k italic_k-qubit states and ℙ⁢(0)ℙ 0\mathbb{P}(0)blackboard_P ( 0 ) is given by (15). In the following, we argue that independent swap tests between k 𝑘 k italic_k-qubit registers and a suitable measurement protocol over the control qubits can be used for implementing a two-layer feedforward neural network.

An alternative way to obtain the same estimate of the square of the inner product is worth to be considered. The swap test, given the quantum states |x⟩,|w⟩∈(ℂ 2)⊗k ket x ket w superscript superscript ℂ 2 tensor-product absent 𝑘{\left|{{\textbf{x}}}\right\rangle},{\left|{\textbf{w}}\right\rangle}\in({% \mathbb{C}}^{2})^{\otimes k}| x ⟩ , | w ⟩ ∈ ( blackboard_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊗ italic_k end_POSTSUPERSCRIPT that encode the input and weight vectors x,w x w{\textbf{x}},\textbf{w}x , w into the amplitudes w.r.t. the computational basis, returns the probability 1 2⁢(1−|⟨x|w⟩|2)=1 2⁢(1−cos 2⁡(x,w))1 2 1 superscript inner-product x w 2 1 2 1 superscript 2 x w\frac{1}{2}(1-|\langle{\textbf{x}}|\textbf{w}\rangle|^{2})=\frac{1}{2}(1-\cos^% {2}({\textbf{x}},\textbf{w}))divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - | ⟨ x | w ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( x , w ) ), by (15), that is the output of a perceptron with a quadratic activation and cosine pre-activation. Alternatively, an analogous parametric model can be defined also considering other encodings, in general given by the action of parametric circuits U x subscript 𝑈 x U_{\textbf{x}}italic_U start_POSTSUBSCRIPT x end_POSTSUBSCRIPT and V w subscript 𝑉 w V_{\textbf{w}}italic_V start_POSTSUBSCRIPT w end_POSTSUBSCRIPT so that |x⟩=U x⁢|0⟩ket x subscript 𝑈 x ket 0{\left|{{\textbf{x}}}\right\rangle}=U_{\textbf{x}}{\left|{0}\right\rangle}| x ⟩ = italic_U start_POSTSUBSCRIPT x end_POSTSUBSCRIPT | 0 ⟩ and |w⟩=V w⁢|0⟩ket w subscript 𝑉 w ket 0{\left|{\textbf{w}}\right\rangle}=V_{\textbf{w}}{\left|{0}\right\rangle}| w ⟩ = italic_V start_POSTSUBSCRIPT w end_POSTSUBSCRIPT | 0 ⟩. For the amplitude encoding, the number of elementary gates required for the implementations of U x subscript 𝑈 x U_{\textbf{x}}italic_U start_POSTSUBSCRIPT x end_POSTSUBSCRIPT and V x subscript 𝑉 x V_{\textbf{x}}italic_V start_POSTSUBSCRIPT x end_POSTSUBSCRIPT is 𝒪⁢(2 k)𝒪 superscript 2 𝑘\mathcal{O}(2^{k})caligraphic_O ( 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ). The output |⟨x|w⟩|2 superscript inner-product x w 2|\langle{\textbf{x}}|\textbf{w}\rangle|^{2}| ⟨ x | w ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT corresponds to the probability of obtaining the bit string 0∈{0,1}n 0 superscript 0 1 𝑛 0\in{0,1}^{n}0 ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT by the computational measurement over the k 𝑘 k italic_k-qubit state |Ψ⟩x,w=U w⁢U x⁢|0⟩subscript ket Ψ x w subscript 𝑈 w subscript 𝑈 x ket 0{\left|{\Psi}\right\rangle}{{\textbf{x}},\textbf{w}}=U{\textbf{w}}U_{\textbf% {x}}{\left|{0}\right\rangle}| roman_Ψ ⟩ start_POSTSUBSCRIPT x , w end_POSTSUBSCRIPT = italic_U start_POSTSUBSCRIPT w end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT x end_POSTSUBSCRIPT | 0 ⟩. In fact: ℙ Ψ x,w⁢(0)=|⟨0|Ψ x,w⟩|2=|⟨0|U w†⁢U x|0⟩|2=|⟨w|x⟩|2 subscript ℙ subscript Ψ x w 0 superscript inner-product 0 subscript Ψ x w 2 superscript quantum-operator-product 0 subscript superscript 𝑈†w subscript 𝑈 x 0 2 superscript inner-product w x 2\mathbb{P}{\Psi{{\textbf{x}},\textbf{w}}}(0)=|\langle 0|\Psi_{{\textbf{x}},% \textbf{w}}\rangle|^{2}=|\langle 0|U^{\dagger}{\textbf{w}}U{\textbf{x}}{% \left|{0}\right\rangle}|^{2}=|\langle\textbf{w}|{\textbf{x}}\rangle|^{2}blackboard_P start_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT x , w end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 0 ) = | ⟨ 0 | roman_Ψ start_POSTSUBSCRIPT x , w end_POSTSUBSCRIPT ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | ⟨ 0 | italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT start_POSTSUBSCRIPT w end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT x end_POSTSUBSCRIPT | 0 ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | ⟨ w | x ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Therefore, the parametric model can be simply implemented by the circuit U=U w⁢U x 𝑈 subscript 𝑈 w subscript 𝑈 x U=U_{\textbf{w}}U_{\textbf{x}}italic_U = italic_U start_POSTSUBSCRIPT w end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT x end_POSTSUBSCRIPT over k 𝑘 k italic_k qubits initialized in |0⟩ket 0{\left|{0}\right\rangle}| 0 ⟩ instead of the 2⁢k+1 2 𝑘 1 2k+1 2 italic_k + 1 qubits required by the swap test. However, the overall number of the measurements required by the circuit U=U w⁢U x 𝑈 subscript 𝑈 w subscript 𝑈 x U=U_{\textbf{w}}U_{\textbf{x}}italic_U = italic_U start_POSTSUBSCRIPT w end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT x end_POSTSUBSCRIPT is 𝒪⁢(k)𝒪 𝑘\mathcal{O}(k)caligraphic_O ( italic_k ). The requirement of measuring only a single qubit of the swap test is particularly convenient in the case of multiple perceptrons (each implemented via a swap test) where their outputs are combined probabilistically as discussed in the next section. Since each module requires only a single-qubit measurement, the overall computation is simplified. This enables the construction of a weighted sum of the outputs without needing simultaneous measurements on all qubits in the system, making the process more efficient and scalable, especially in photonic architectures where both amplitude encoding and swap test can be efficiently implemented [17, 18].

3 The proposed model

Let us consider m 𝑚 m italic_m copies of the process diagram (16), that we call modules, each module presents two k 𝑘 k italic_k-qubit registers and one auxiliary qubit to perform the swap test. Then the total number of qubits, including the controls, is m⁢(2⁢k+1)𝑚 2 𝑘 1 m(2k+1)italic_m ( 2 italic_k + 1 ). Let x∈ℝ N x superscript ℝ 𝑁{\textbf{x}}\in{\mathbb{R}}^{N}x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT be an input vector and w∈ℝ N w superscript ℝ 𝑁\textbf{w}\in{\mathbb{R}}^{N}w ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT be a vector of parameters, if k≥log⁡N 𝑘 𝑁 k\geq\log N italic_k ≥ roman_log italic_N then a single module is sufficient to store x and w in its registers and the action of the swap test, which returns the probability 1 2⁢(1−cos 2⁡(x,w))1 2 1 superscript 2 x w\frac{1}{2}(1-\cos^{2}({\textbf{x}},\textbf{w}))divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( x , w ) ) as the value of a non-linear function over cos⁡(x,w)x w\cos({\textbf{x}},\textbf{w})roman_cos ( x , w ), realizes an elementary perceptron with quadratic activation and cosine pre-activation. However, we are not interested in implementing a single perceptron and according to the general assumption of few quantum resources available, we consider the case k<log⁡N 𝑘 𝑁 k<\log N italic_k < roman_log italic_N, so in each module we can encode the input and weight vectors only partially.

Proposition 3

Let ℛ ℛ\mathcal{R}caligraphic_R be a k 𝑘 k italic_k-qubit register. Let N>2 k 𝑁 superscript 2 𝑘 N>2^{k}italic_N > 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, the number m 𝑚 m italic_m of non-interacting 2 2 2 In this context, non-interacting registers means that any operation on the registers, including those required for state preparation, must be performed locally on each register. Therefore, no entanglement can be produced among different k 𝑘 k italic_k-qubit registers. copies of ℛ ℛ\mathcal{R}caligraphic_R needed for storing x∈ℂ N x superscript ℂ 𝑁{\textbf{x}}\in{\mathbb{C}}^{N}x ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT into a state ⨂i=1 m|ψ i⟩superscript subscript tensor-product 𝑖 1 𝑚 ket subscript 𝜓 𝑖\bigotimes_{i=1}^{m}{\left|{\psi_{i}}\right\rangle}⨂ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ by amplitude encoding is:

m=𝒪⁢(N⁢2−k).𝑚 𝒪 𝑁 superscript 2 𝑘\displaystyle m=\mathcal{O}(N2^{-k}).italic_m = caligraphic_O ( italic_N 2 start_POSTSUPERSCRIPT - italic_k end_POSTSUPERSCRIPT ) .(17)

Proof.

In each register, we can encode at most 2 k superscript 2 𝑘 2^{k}2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT entries of x into the amplitudes of a k 𝑘 k italic_k-qubit state. Therefore the minimum number m 𝑚 m italic_m of registers for storing x into a pure state, where the registers are non-entangled with each other, satisfies (m−1)⁢2 k<N<m⁢2 k 𝑚 1 superscript 2 𝑘 𝑁 𝑚 superscript 2 𝑘(m-1)2^{k}<N<m2^{k}( italic_m - 1 ) 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT < italic_N < italic_m 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT.

∎

Considering m 𝑚 m italic_m independent modules, we can encode at most m⁢2 k 𝑚 superscript 2 𝑘 m2^{k}italic_m 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT input entries and m⁢2 k 𝑚 superscript 2 𝑘 m2^{k}italic_m 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT weights into the available qubits (divided into the input register and the weight register of each module). Therefore, by proposition 3, the total number of qubits required to encode and process x∈ℝ N x superscript ℝ 𝑁{\textbf{x}}\in{\mathbb{R}}^{N}x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is:

n=𝒪⁢(N⁢(2⁢k+1)⁢2−k).𝑛 𝒪 𝑁 2 𝑘 1 superscript 2 𝑘\displaystyle n=\mathcal{O}\left(N(2k+1)2^{-k}\right).italic_n = caligraphic_O ( italic_N ( 2 italic_k + 1 ) 2 start_POSTSUPERSCRIPT - italic_k end_POSTSUPERSCRIPT ) .(18)

Since a single module does not have an input register large enough to encode all the entries of the input x (i.e. k<log⁡N 𝑘 𝑁 k<\log N italic_k < roman_log italic_N), we have to settle for a piecewise amplitude encoding parallelizing the modules, as a consequence the total number of qubits scales linearly in N 𝑁 N italic_N, however there is an exponential decay in the size of a single module which can be relevant in specific cases. For instance, if N=128 𝑁 128 N=128 italic_N = 128 and we have a single module with a 5 5 5 5-qubit input register and a 5 5 5 5-qubit weight register then 4 modules and 44 qubits are required to store an input vector and a weight vector. Note that in this context 44 qubits is not considered a large number because the coherence must be maintained only inside a single module (that is 11 qubits, two 5-qubit registers and 1 control qubit). Thus, the parallelization of modules with few qubits cannot reach of course an exponential storing capacity but there are cases with realistic values of k 𝑘 k italic_k and N 𝑁 N italic_N that can be addressed efficiently by modularity.

In order to introduce the parametric model that we can implement executing swap tests in parallel, let us partition the input vector x∈ℝ N x superscript ℝ 𝑁{\textbf{x}}\in{\mathbb{R}}^{N}x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and the weight vector w∈ℝ N w superscript ℝ 𝑁\textbf{w}\in{\mathbb{R}}^{N}w ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT into pieces to feed the m 𝑚 m italic_m modules (each one with 2⁢k+1 2 𝑘 1 2k+1 2 italic_k + 1 qubits) assuming that N=m⁢2 k 𝑁 𝑚 superscript 2 𝑘 N=m2^{k}italic_N = italic_m 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT without loss of generality:

x=(x(1),…,x(m))where x(l)=(x(l−1)⁢2 k+1,…,x l⁢2 k),l=1,…,m formulae-sequence x superscript x 1…superscript x 𝑚 where formulae-sequence superscript x 𝑙 subscript 𝑥 𝑙 1 superscript 2 𝑘 1…subscript 𝑥 𝑙 superscript 2 𝑘 𝑙 1…𝑚\displaystyle{\textbf{x}}=({\textbf{x}}^{(1)},...,{\textbf{x}}^{(m)})\quad% \mbox{where}\quad{\textbf{x}}^{(l)}=(x_{(l-1)2^{k}+1},...,x_{l2^{k}}),\qquad l% =1,...,m x = ( x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , x start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) where x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ( italic_x start_POSTSUBSCRIPT ( italic_l - 1 ) 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_l 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , italic_l = 1 , … , italic_m(19)

w=(w(1),…,w(m))where w(l)=(w(l−1)⁢2 k+1,…,w l⁢2 k),l=1,…,m formulae-sequence w superscript w 1…superscript w 𝑚 where formulae-sequence superscript w 𝑙 subscript 𝑤 𝑙 1 superscript 2 𝑘 1…subscript 𝑤 𝑙 superscript 2 𝑘 𝑙 1…𝑚\textbf{w}=(\textbf{w}^{(1)},...,\textbf{w}^{(m)})\quad\mbox{where}\quad% \textbf{w}^{(l)}=(w_{(l-1)2^{k}+1},...,w_{l2^{k}}),\qquad l=1,...,m w = ( w start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , w start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) where w start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT = ( italic_w start_POSTSUBSCRIPT ( italic_l - 1 ) 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_l 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , italic_l = 1 , … , italic_m

x(l)superscript x 𝑙{\textbf{x}}^{(l)}x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT can be stored into the k 𝑘 k italic_k-qubit input register of the l 𝑙 l italic_l th module and w(l)superscript w 𝑙\textbf{w}^{(l)}w start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT can be stored into the k 𝑘 k italic_k-qubit weight register of the l 𝑙 l italic_l th module, then the computation of the norms ‖x(l)‖norm superscript x 𝑙\parallel!{\textbf{x}}^{(l)}!\parallel∥ x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∥ and ‖w(l)‖norm superscript w 𝑙\parallel!\textbf{w}^{(l)}!\parallel∥ w start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∥, for any l=1,…,m 𝑙 1…𝑚 l=1,...,m italic_l = 1 , … , italic_m, are required for the encoding. The parallelization of the modules to perform m 𝑚 m italic_m swap tests can be easily visualized as follows:

(20)

⋮⋮\vdots⋮

⋮⋮⋮⋮\vdots\qquad\qquad\vdots\qquad\qquad,,⋮ ⋮

⋮⋮\vdots⋮

where ℙ l⁢(0)subscript ℙ 𝑙 0\mathbb{P}_{l}(0)blackboard_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( 0 ) is the probability of measuring 0 over the l 𝑙 l italic_l th control qubit which can be estimated within an error ϵ italic-ϵ\epsilon italic_ϵ with 𝒪⁢(ϵ−2)𝒪 superscript italic-ϵ 2\mathcal{O}(\epsilon^{-2})caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) runs. The multiple swap tests depicted in (20) can be seen as a layer of perceptrons with connectivity limited by the mutual independence of the modules. On this first elementary structure, we construct the second layer applying a specific local measurement protocol to the m 𝑚 m italic_m control qubits.

By definition 2, a measurement process provides a classical probability distribution over the set of possible outcomes. In operational terms, a measurement process is repeated 𝒩 𝒩\mathcal{N}caligraphic_N times and a string of outcomes x∈ℝ 𝒩 𝑥 superscript ℝ 𝒩 x\in{\mathbb{R}}^{\mathcal{N}}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT is stored, where the probability distribution is estimated in terms of relative frequencies. In the following definition, let us assume that only 𝒩<𝒩𝒩 𝒩\widetilde{\mathcal{N}}<\mathcal{N}over~ start_ARG caligraphic_N end_ARG < caligraphic_N outcomes are stored after 𝒩 𝒩\mathcal{N}caligraphic_N repetitions of a considered measurement process ℳ ℳ\mathcal{M}caligraphic_M.

Definition 4

Let ℳ ℳ\mathcal{M}caligraphic_M be a measurement process. Assume to repeat the measurement 𝒩 𝒩\mathcal{N}caligraphic_N times storing 𝒩<𝒩𝒩 𝒩\widetilde{\mathcal{N}}<\mathcal{N}over~ start_ARG caligraphic_N end_ARG < caligraphic_N outcomes, then ℳ ℳ\mathcal{M}caligraphic_M is called lossy measurement with efficiency p=𝒩~~/𝒩 𝑝~~𝒩 𝒩 p=\widetilde{\mathcal{N}}/\mathcal{N}italic_p = over~ start_ARG caligraphic_N end_ARG / caligraphic_N.

The measurement protocol that we need for constructing the proposed model is characterized by local measurements over single qubits characterized by given efficiencies.

Definition 5

Given a system composed by m 𝑚 m italic_m qubits, a local measurement protocol is a collection {(ℳ l,p l)}l=1,…,m subscript subscript ℳ 𝑙 subscript 𝑝 𝑙 𝑙 1…𝑚{(\mathcal{M}{l},p{l})}{l=1,...,m}{ ( caligraphic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_l = 1 , … , italic_m end_POSTSUBSCRIPT where ℳ l subscript ℳ 𝑙\mathcal{M}{l}caligraphic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is a lossy measurement process on the l 𝑙 l italic_l th qubit and p l subscript 𝑝 𝑙 p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT its efficiency.

If ℳ l={|0⟩⁢⟨0|,|1⟩⁢⟨1|}subscript ℳ 𝑙 ket 0 bra 0 ket 1 bra 1\mathcal{M}{l}={{\left|{0}\right\rangle}{\left\langle{0}\right|},{\left|{1}% \right\rangle}{\left\langle{1}\right|}}caligraphic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = { | 0 ⟩ ⟨ 0 | , | 1 ⟩ ⟨ 1 | } for any l=1,…,m 𝑙 1…𝑚 l=1,...,m italic_l = 1 , … , italic_m then {(ℳ l,p l)}l=1,…,m subscript subscript ℳ 𝑙 subscript 𝑝 𝑙 𝑙 1…𝑚{(\mathcal{M}{l},p_{l})}_{l=1,...,m}{ ( caligraphic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_l = 1 , … , italic_m end_POSTSUBSCRIPT is called computational measurement protocol.

In the case of the computational measurement protocol over m 𝑚 m italic_m qubits, the outcomes provided by a single shot of the protocol form a string b={∅,0,1}m 𝑏 superscript 0 1 𝑚 b={\emptyset,0,1}^{m}italic_b = { ∅ , 0 , 1 } start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, where the symbol ∅\emptyset∅ denotes a missing outcome. In the case the measurement processes ℳ l subscript ℳ 𝑙\mathcal{M}{l}caligraphic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT are lossless, the efficiency p l subscript 𝑝 𝑙 p{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT can be controlled selecting the number 𝒩~~𝒩\widetilde{\mathcal{N}}over~ start_ARG caligraphic_N end_ARG of times a measurement ℳ l subscript ℳ 𝑙\mathcal{M}_{l}caligraphic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is actually performed when the overall protocol is repeated 𝒩 𝒩\mathcal{N}caligraphic_N times. From the physical viewpoint, one can also select the efficiency of the measurements controlling the losses of employed channels.

Given the m 𝑚 m italic_m modules executing the multiple (independent) swap tests depicted in diagram (20), let us define a model with parameters given by the weights w∈ℝ N w superscript ℝ 𝑁\textbf{w}\in{\mathbb{R}}^{N}w ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and the efficiencies p=(p 1,…,p m)p subscript 𝑝 1…subscript 𝑝 𝑚\textbf{p}=(p_{1},...,p_{m})p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) such that the prediction associated to the input x∈ℝ N x superscript ℝ 𝑁{\textbf{x}}\in{\mathbb{R}}^{N}x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is given by:

f⁢(x;w,p)=𝒩 0 𝒩,𝑓 x w p subscript 𝒩 0 𝒩\displaystyle f({\textbf{x}};\textbf{w},{\textbf{p}})=\frac{\mathcal{N}_{0}}{% \mathcal{N}},italic_f ( x ; w , p ) = divide start_ARG caligraphic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_N end_ARG ,(21)

where 𝒩 𝒩\mathcal{N}caligraphic_N is the number of repetitions of the computational measurement protocol, with efficiencies p=(p 1,…,p m)p subscript 𝑝 1…subscript 𝑝 𝑚{\textbf{p}}=(p_{1},...,p_{m})p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ), on the control qubits and 𝒩 0 subscript 𝒩 0\mathcal{N}_{0}caligraphic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the total number of obtained zeros.

Proposition 6

The model (21) is equivalent, up to an error ϵ=𝒪⁢(1/𝒩)italic-ϵ 𝒪 1 𝒩\epsilon=\mathcal{O}(1/\sqrt{\mathcal{N}})italic_ϵ = caligraphic_O ( 1 / square-root start_ARG caligraphic_N end_ARG ), to the following two-layer feedforward neural network with cosine pre-activation and activation φ⁢(z)=1 2⁢(1−z 2)𝜑 𝑧 1 2 1 superscript 𝑧 2\varphi(z)=\frac{1}{2}(1-z^{2})italic_φ ( italic_z ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) in the hidden neurons:

(22)

Proof.

By the application of the computational measurement protocol with efficiencies {p l}l=1,…,m subscript subscript 𝑝 𝑙 𝑙 1…𝑚{p_{l}}_{l=1,...,m}{ italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_l = 1 , … , italic_m end_POSTSUBSCRIPT to the scheme (20), the overall probability of obtaining the outcome 0 0 on the l 𝑙 l italic_l th control qubit is:

ℙ^l⁢(0)=p l⁢ℙ l⁢(0)=p l 2⁢(1−cos 2⁡(x(l),w(l))).subscript^ℙ 𝑙 0 subscript 𝑝 𝑙 subscript ℙ 𝑙 0 subscript 𝑝 𝑙 2 1 superscript 2 superscript x 𝑙 superscript w 𝑙\displaystyle\widehat{\mathbb{P}}{l}(0)=p{l},\mathbb{P}{l}(0)=\frac{p{l}}% {2}\left(1-\cos^{2}({\textbf{x}}^{(l)},\textbf{w}^{(l)})\right).over^ start_ARG blackboard_P end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( 0 ) = italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( 0 ) = divide start_ARG italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT , w start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ) ) .(23)

The probability can be estimated, up to an error ϵ italic-ϵ\epsilon italic_ϵ, by the relative frequency 𝒩 0(l)/𝒩 superscript subscript 𝒩 0 𝑙 𝒩\mathcal{N}{0}^{(l)}/\mathcal{N}caligraphic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT / caligraphic_N where 𝒩 0(l)superscript subscript 𝒩 0 𝑙\mathcal{N}{0}^{(l)}caligraphic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is the number of zeros obtained on the l 𝑙 l italic_l th qubit by 𝒩=𝒪⁢(ϵ−2)𝒩 𝒪 superscript italic-ϵ 2\mathcal{N}=\mathcal{O}(\epsilon^{-2})caligraphic_N = caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) repetitions of the computational measurement protocol. By construction, the output of the neural network (22) corresponds to the value ∑l ℙ^l⁢(0)subscript 𝑙 subscript^ℙ 𝑙 0\sum_{l}\widehat{\mathbb{P}}_{l}(0)∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over^ start_ARG blackboard_P end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( 0 ) that can be estimated by:

∑l=1 m 𝒩 0(l)𝒩=𝒩 0 𝒩.superscript subscript 𝑙 1 𝑚 superscript subscript 𝒩 0 𝑙 𝒩 subscript 𝒩 0 𝒩\sum_{l=1}^{m}\frac{\mathcal{N}{0}^{(l)}}{\mathcal{N}}=\frac{\mathcal{N}{0}}% {\mathcal{N}}.∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT divide start_ARG caligraphic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT end_ARG start_ARG caligraphic_N end_ARG = divide start_ARG caligraphic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_N end_ARG .

Therefore, for any input x and any choice of the parameters w and p the output 𝒩 0/𝒩 subscript 𝒩 0 𝒩\mathcal{N}_{0}/\mathcal{N}caligraphic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / caligraphic_N of the model (21) corresponds to the output of the neural network (22) up to an error ϵ=𝒪⁢(1/𝒩)italic-ϵ 𝒪 1 𝒩\epsilon=\mathcal{O}(1/\sqrt{\mathcal{N}})italic_ϵ = caligraphic_O ( 1 / square-root start_ARG caligraphic_N end_ARG ).

∎

Proposition 6 states that we can construct feedforward neural networks by combination of swap tests as modules with fixed size, i.e. defined over qubit registers with given number of qubits. Let us note that the number of hidden and output neurons can be arbitrarily increased. In fact, with additional runs of the swap tests, by permutation of the input entries and the re-initialization of the weights w one can realize the connections between the input neurons and an arbitrary number of hidden neurons under the topological constraint given by the size of the modules. Collecting the outcomes obtained by the computational measurement protocol with varying efficiencies, we can define the connections from the hidden neurons and an arbitrary number of output nodes. In this respect, the following proposition enables the implementation of a complete feedforward neural network by a suitable measurement protocol.

Proposition 7

Given a single module of size k 𝑘 k italic_k and the positive integers R 𝑅 R italic_R and Q 𝑄 Q italic_Q, let be w 1,…,w R∈ℝ 2 k subscript w 1…subscript w 𝑅 superscript ℝ superscript 2 𝑘\textbf{w}{1},...,\textbf{w}{R}\in{\mathbb{R}}^{2^{k}}w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , w start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and p r⁢q∈[0,1]subscript 𝑝 𝑟 𝑞 0 1 p_{rq}\in[0,1]italic_p start_POSTSUBSCRIPT italic_r italic_q end_POSTSUBSCRIPT ∈ [ 0 , 1 ] for any r=1,…,R 𝑟 1…𝑅 r=1,...,R italic_r = 1 , … , italic_R and q=1,…,Q 𝑞 1…𝑄 q=1,...,Q italic_q = 1 , … , italic_Q. Let us consider the following algorithm:

Input:Input vector

x∈ℝ 2 k x superscript ℝ superscript 2 𝑘{\textbf{x}}\in{\mathbb{R}}^{2^{k}}x ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT

Result:Output vector

y∈ℝ Q y superscript ℝ 𝑄\textbf{y}\in{\mathbb{R}}^{Q}y ∈ blackboard_R start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT

3 foreach r←1,…,R←𝑟 1…𝑅 r\leftarrow 1,...,R italic_r ← 1 , … , italic_R do

4 foreach q←1,…,Q←𝑞 1…𝑄 q\leftarrow 1,...,Q italic_q ← 1 , … , italic_Q do

5 run the swap test

𝒩 𝒩\mathcal{N}caligraphic_N times on

|x⟩ket x{\left|{{\textbf{x}}}\right\rangle}| x ⟩ and

|w r⟩ket subscript w 𝑟{\left|{\textbf{w}_{r}}\right\rangle}| w start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ and measure the control qubit with efficiency

p r⁢q subscript 𝑝 𝑟 𝑞 p_{rq}italic_p start_POSTSUBSCRIPT italic_r italic_q end_POSTSUBSCRIPT ;

6 collect the outcomes

b r⁢q∈{∅,0,1}𝒩 subscript 𝑏 𝑟 𝑞 superscript 0 1 𝒩 b_{rq}\in{\emptyset,0,1}^{\mathcal{N}}italic_b start_POSTSUBSCRIPT italic_r italic_q end_POSTSUBSCRIPT ∈ { ∅ , 0 , 1 } start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ;

8 end foreach

10 end foreach

11 foreach q←1,…,Q←𝑞 1…𝑄 q\leftarrow 1,...,Q italic_q ← 1 , … , italic_Q do

12 construct the concatenation

b q=b 1⁢q⁢b 2⁢q⁢⋯⁢b R⁢q∈{∅,0,1}R×𝒩 subscript 𝑏 𝑞 subscript 𝑏 1 𝑞 subscript 𝑏 2 𝑞⋯subscript 𝑏 𝑅 𝑞 superscript 0 1 𝑅 𝒩 b_{q}=b_{1q}b_{2q}\cdots b_{Rq}\in{\emptyset,0,1}^{R\times\mathcal{N}}italic_b start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT 1 italic_q end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 italic_q end_POSTSUBSCRIPT ⋯ italic_b start_POSTSUBSCRIPT italic_R italic_q end_POSTSUBSCRIPT ∈ { ∅ , 0 , 1 } start_POSTSUPERSCRIPT italic_R × caligraphic_N end_POSTSUPERSCRIPT ;

y q←𝒩 0,q R⁢𝒩←subscript 𝑦 𝑞 subscript 𝒩 0 𝑞 𝑅 𝒩 y_{q}\leftarrow\frac{\mathcal{N}_{0,q}}{R\mathcal{N}}italic_y start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ← divide start_ARG caligraphic_N start_POSTSUBSCRIPT 0 , italic_q end_POSTSUBSCRIPT end_ARG start_ARG italic_R caligraphic_N end_ARG [

𝒩 0,q subscript 𝒩 0 𝑞\mathcal{N}_{0,q}caligraphic_N start_POSTSUBSCRIPT 0 , italic_q end_POSTSUBSCRIPT is the number of zeros in the string

b q subscript 𝑏 𝑞 b_{q}italic_b start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ]

14 end foreach

15 return

y=(y 1,…,y Q)y subscript 𝑦 1…subscript 𝑦 𝑄\textbf{y}=(y_{1},...,y_{Q})y = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT )

The output y∈ℝ Q y superscript ℝ 𝑄\textbf{y}\in{\mathbb{R}}^{Q}y ∈ blackboard_R start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT corresponds, up to an error 𝒪⁢(1/R⁢𝒩)𝒪 1 𝑅 𝒩\mathcal{O}(1/\sqrt{R\mathcal{N}})caligraphic_O ( 1 / square-root start_ARG italic_R caligraphic_N end_ARG ), to the output of a fully connected feedforward neural network, with cosine pre-activation and activation φ⁢(z)=1 2⁢(1−z 2)𝜑 𝑧 1 2 1 superscript 𝑧 2\varphi(z)=\frac{1}{2}(1-z^{2})italic_φ ( italic_z ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), with 2 k superscript 2 𝑘 2^{k}2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT input nodes, R 𝑅 R italic_R hidden nodes, and Q 𝑄 Q italic_Q output nodes.

Proof.

Let us prove the claim simply for

k=R=Q=2 𝑘 𝑅 𝑄 2 k=R=Q=2 italic_k = italic_R = italic_Q = 2 , since the generalization is straightforward. Consider a module with

k=2 𝑘 2 k=2 italic_k = 2 and the execution of the following steps (

Q=2 𝑄 2 Q=2 italic_Q = 2 ):

run the swap test

𝒩 𝒩\mathcal{N}caligraphic_N times with weights

w 1∈ℝ 4 subscript w 1 superscript ℝ 4\textbf{w}_{1}\in{\mathbb{R}}^{4}w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and measure the control qubit with efficiency

p 11 subscript 𝑝 11 p_{11}italic_p start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT . Collect the outcomes

b 11∈{∅,0,1}𝒩 subscript 𝑏 11 superscript 0 1 𝒩 b_{11}\in{\emptyset,0,1}^{\mathcal{N}}italic_b start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∈ { ∅ , 0 , 1 } start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ;

run the swap test

𝒩 𝒩\mathcal{N}caligraphic_N times, with the same weights and measure the control qubit with efficiency

p 12 subscript 𝑝 12 p_{12}italic_p start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT . Collect the outcomes

b 12∈{∅,0,1}𝒩 subscript 𝑏 12 superscript 0 1 𝒩 b_{12}\in{\emptyset,0,1}^{\mathcal{N}}italic_b start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ∈ { ∅ , 0 , 1 } start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT ;

By proposition 6, computing the relative frequency of 0 over

b 11 subscript 𝑏 11 b_{11}italic_b start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT and

b 12 subscript 𝑏 12 b_{12}italic_b start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT respectively we obtain the estimations of the probabilities

p 11 2⁢(1−cos 2⁡(w 1,x))subscript 𝑝 11 2 1 superscript 2 subscript w 1 x\frac{p_{11}}{2}(1-\cos^{2}(\textbf{w}_{1},{\textbf{x}}))divide start_ARG italic_p start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , x ) ) and

p 12 2⁢(1−cos 2⁡(w 2,x))subscript 𝑝 12 2 1 superscript 2 subscript w 2 x\frac{p_{12}}{2}(1-\cos^{2}(\textbf{w}_{2},{\textbf{x}}))divide start_ARG italic_p start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , x ) ) implementing the following network up to an error

𝒪⁢(1/𝒩)𝒪 1 𝒩\mathcal{O}(1/\sqrt{\mathcal{N}})caligraphic_O ( 1 / square-root start_ARG caligraphic_N end_ARG ) :

Repeating the steps 1 and 2 with weights w 2∈ℝ 4 subscript w 2 superscript ℝ 4\textbf{w}{2}\in{\mathbb{R}}^{4}w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and efficiencies p 21 subscript 𝑝 21 p{21}italic_p start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT and p 22 subscript 𝑝 22 p_{22}italic_p start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT, we can collect also the outcomes b 21,b 22∈{∅,0,1}𝒩 subscript 𝑏 21 subscript 𝑏 22 superscript 0 1 𝒩 b_{21},b_{22}\in{\emptyset,0,1}^{\mathcal{N}}italic_b start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ∈ { ∅ , 0 , 1 } start_POSTSUPERSCRIPT caligraphic_N end_POSTSUPERSCRIPT. Computing the relative frequency of 0 in the strings b 21 subscript 𝑏 21 b_{21}italic_b start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT and b 22 subscript 𝑏 22 b_{22}italic_b start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT, we can implement another network as well:

In order to obtain a complete network, we can combine the two networks above (R=2 𝑅 2 R=2 italic_R = 2), providing the outputs:

y 1=p 11 2⁢(1−cos 2⁡(w 1,x))+p 21 2⁢(1−cos 2⁡(w 2,x)),subscript 𝑦 1 subscript 𝑝 11 2 1 superscript 2 subscript w 1 x subscript 𝑝 21 2 1 superscript 2 subscript w 2 x y_{1}=\frac{p_{11}}{2}(1-\cos^{2}(\textbf{w}{1},{\textbf{x}}))+\frac{p{21}}{% 2}(1-\cos^{2}(\textbf{w}_{2},{\textbf{x}})),italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG italic_p start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , x ) ) + divide start_ARG italic_p start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , x ) ) ,

y 2=p 12 2⁢(1−cos 2⁡(w 1,x))+p 22 2⁢(1−cos 2⁡(w 2,x)).subscript 𝑦 2 subscript 𝑝 12 2 1 superscript 2 subscript w 1 x subscript 𝑝 22 2 1 superscript 2 subscript w 2 x y_{2}=\frac{p_{12}}{2}(1-\cos^{2}(\textbf{w}{1},{\textbf{x}}))+\frac{p{22}}{% 2}(1-\cos^{2}(\textbf{w}_{2},{\textbf{x}})).italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_p start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , x ) ) + divide start_ARG italic_p start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - roman_cos start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , x ) ) .

For estimating the sum y 1 subscript 𝑦 1 y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT of the two probabilities we take the sum of the relative frequencies of the outcome 0 in the string b 11 subscript 𝑏 11 b_{11}italic_b start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT and in the string b 21 subscript 𝑏 21 b_{21}italic_b start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT which obviously corresponds to the relative frequency of the outcome zero in the concatenation b 11⁢b 21 subscript 𝑏 11 subscript 𝑏 21 b_{11}b_{21}italic_b start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT. In the same way, we estimate y 2 subscript 𝑦 2 y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as the relative frequency of 0 in the string b 12⁢b 22 subscript 𝑏 12 subscript 𝑏 22 b_{12}b_{22}italic_b start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT. In other words, given an input x∈ℝ 4 x superscript ℝ 4{\textbf{x}}\in{\mathbb{R}}^{4}x ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and the parameters w 1,w 2,p 11,p 12,p 21,p 22 subscript w 1 subscript w 2 subscript 𝑝 11 subscript 𝑝 12 subscript 𝑝 21 subscript 𝑝 22\textbf{w}{1},\textbf{w}{2},p_{11},p_{12},p_{21},p_{22}w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT, we have a model that outputs:

y 1=𝒩 0,1 2⁢𝒩 and y 2=𝒩 0,2 2⁢𝒩,formulae-sequence subscript 𝑦 1 subscript 𝒩 0 1 2 𝒩 and subscript 𝑦 2 subscript 𝒩 0 2 2 𝒩 y_{1}=\frac{\mathcal{N}{0,1}}{2\mathcal{N}}\qquad\mbox{and}\qquad y{2}=\frac% {\mathcal{N}_{0,2}}{2\mathcal{N}},italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG caligraphic_N start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 caligraphic_N end_ARG and italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG caligraphic_N start_POSTSUBSCRIPT 0 , 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 caligraphic_N end_ARG ,

where 𝒩 0,i subscript 𝒩 0 𝑖\mathcal{N}{0,i}caligraphic_N start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT is the number of zeros in the concatenation b i=b 1⁢i⁢b 2⁢i subscript 𝑏 𝑖 subscript 𝑏 1 𝑖 subscript 𝑏 2 𝑖 b{i}=b_{1i}b_{2i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT. Therefore, y 1 subscript 𝑦 1 y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and y 2 subscript 𝑦 2 y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT correspond, up to an error 𝒪⁢(1/2⁢𝒩)𝒪 1 2 𝒩\mathcal{O}(1/\sqrt{2\mathcal{N}})caligraphic_O ( 1 / square-root start_ARG 2 caligraphic_N end_ARG ), to the outputs of the fully connected network:

where the 8 connections in the first layer are weighted by the components of w 1 subscript w 1\textbf{w}{1}w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and w 2 subscript w 2\textbf{w}{2}w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the 4 connections in the second layer are weighted by p 11,p 12,p 21,p 22 subscript 𝑝 11 subscript 𝑝 12 subscript 𝑝 21 subscript 𝑝 22 p_{11},p_{12},p_{21},p_{22}italic_p start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT.

By direct generalization of the construction above we have that the number of steps Q 𝑄 Q italic_Q establishes the number of output neurons and the number of repetitions R 𝑅 R italic_R corresponds to the number of hidden neurons.

∎

Let us remark that the complete topology of the network, provided by proposition 7, in the first layer is implied by the fact that we consider a single module for the calculation of the swap test. Otherwise, the connectivity in the first layer depends by the number of the considered modules and their size k 𝑘 k italic_k. On the contrary, the local measurement protocol described in the algorithm of proposition 7 always enables all-to-all connections in the second layer.

4 Discussion

The statements of propositions 6 and 7 provide a general procedure of constructing measurement protocols for the implementation of feedforward neural networks using the outcomes obtained running multiple swap tests between quantum states encoding the input vectors and the weights vector. In particular, proposition 6 implies that if we are able to perform independent swap test over k 𝑘 k italic_k-qubit registers then a suitable sample is sufficient for obtaining a neural structure for processing data encoded into the amplitudes of input states. Moreover, proposition 7 establishes that, by suitable repetitions of the swap test, one can decide the number of hidden and output neurons. In particular, if the number of features in the considered dataset is sufficiently low, complete neural networks can be implemented as a direct consequence of propositions 6 and 7.

The modular structure of the model naturally enables the possibility of ensemble learning. In fact, an input vector can be trivially partitioned as described in section 3 but also random partitions can be actuated, in this case a random sampling is performed over the training set as done in bootstrap aggregating for instance. Moreover, different k 𝑘 k italic_k-sized modules initialized with the same weights w∈ℝ 2 k w superscript ℝ superscript 2 𝑘\textbf{w}\in{\mathbb{R}}^{2^{k}}w ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT can be interpreted as the application of a convolutional layer followed by a local pooling as done in convolutional neural networks.

From the viewpoint of time complexity, the performances of the proposed model are bounded by the state preparation for encoding the input x∈ℝ N x superscript ℝ 𝑁{\textbf{x}}\in{\mathbb{R}}^{N}x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and the weights w∈ℝ N w superscript ℝ 𝑁\textbf{w}\in{\mathbb{R}}^{N}w ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT into quantum states since the time taken by the measurement protocols is independent from the input size and only related to accuracy and hyperparameters. The general issue of efficient state preparation is a common bottleneck of the current quantum machine learning algorithms, in particular in quantum neural networks. Indeed, the encoding of a data vector x∈ℝ N x superscript ℝ 𝑁{\textbf{x}}\in{\mathbb{R}}^{N}x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT into the amplitudes of a log⁡N 𝑁\log N roman_log italic_N-qubit state |x⟩ket x{\left|{{\textbf{x}}}\right\rangle}| x ⟩ can be performed in time 𝒪⁢(log⁡N)𝒪 𝑁\mathcal{O}(\log N)caligraphic_O ( roman_log italic_N ) only if a quantum RAM (QRAM) is available otherwise the retrieval of |x⟩ket x{\left|{{\textbf{x}}}\right\rangle}| x ⟩ and the initialization of |w⟩ket w{\left|{\textbf{w}}\right\rangle}| w ⟩ take time 𝒪⁢(N)𝒪 𝑁\mathcal{O}(N)caligraphic_O ( italic_N ). From the viewpoint of the space complexity, there is not the expected exponential storage capacity because we assume the availability of a collection of small quantum register where the input can be encoded piecewise. However, the number of required qubits is tamed by an exponential decay in the size k 𝑘 k italic_k of a single module then, for suitable values of N 𝑁 N italic_N and k 𝑘 k italic_k, data can be represented efficiently. Moreover, even if the number of qubits is high, we do not require the coherence over all the qubits but only in any module where the swap test is performed. In addition to the advantage in the storing capacity (bounded by k 𝑘 k italic_k) the quantum implementation of the proposed scheme offers the possibility of submitting superposition of data to the network. Moreover, even if the modules are a priori uncorrelated, an input can be represented by quantum data with entanglement distributed on a high number of qubits, in this case the quantum processing of a widely entangled input creates quantum correlations among the modules. In general, the input of the model can be a quantum state encoding a feature vector as discussed but also non-artificial quantum data entangling the modules giving rise to a non-trivial execution of the neural network.

Let us remark that the global structure of neural network is also preserved when modules are of minimum size (k=1 𝑘 1 k=1 italic_k = 1), in this case we have the lowest storing capacity and the first layer turns out to be sparse. Nevertheless, the model is still quantum then capable to process quantum data as a two-layer feedforward network.

5 Conclusions

In this paper we have presented a scheme for implementing a two-layer feedforward neural network using repeated swap tests over quantum registers and suitable measurement protocols. The main goal of the work is showing how to create a non-trivial quantum machine learning model when few quantum resources are available. In particular, we assume the availability of a specific-purpose quantum hardware, that we call module, capable of implementing the swap test (this assumption is motivated by several recent proposals of photonic platforms for this task). By combination of an arbitrary amount of these modules, a scalable neural network can be constructed where the first layer is implemented by the quantum processing itself and the second one is obtained in terms of a measurement protocol characterized by varying efficiencies in registering the outcomes. Even if the composition of independent modules does not create entanglement, an entangled state in input introduces quantum correlations in the network.

The empirical evaluation of the proposed model is not addressed in the present paper where only a theoretical construction is provided. From the experimental viewpoint, in the case of amplitude encoding of classical data a simulation of the multiple swap tests is not so useful because it mathematically corresponds to the execution of a two-layer network with quadratic activation that is a well-known object already investigated in deep. Then, experiments on real quantum hardware would be definitely more significant for revealing actual quantum advantages of the discussed proposal. On the contrary, in the case of quantum data where the input can be represented by highly entangled quantum states, a simulation could be interesting but restricted to a small total number of qubits for computational reasons. Indeed, in our opinion, the crucial experimental point will be the combination of real photonic hardware, as small modules implementing the swap test over few qubits, in order to quantify the performances of the scalable quantum neural networks constructed by modularity. In this perspective, a realization with photonic platforms can lead to an on-chip embedding of the modules increasing the technological impact of the proposed QML architecture.

Acknowledgements

This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU. The authors are grateful to Sebastian Nagies and Emiliano Tolotti whose ongoing work on the implementation provided useful insights.

References

[1] P. Wittek Quantum machine learning Academic Press 2014.
[2] M. Schuld, F. Petruccione Machine learning with quantum computers Springer Cham 2021.
[3] D. Pastorello Concise guide to quantum machine learning Springer Singapore 2023.
[4] H. Buhrman, R. Cleve, J. Watrous, R. de Wolf Quantum fingerprinting Phys. Rev. Lett. 87, 167902 (2001)
[5] M. Fiorentino, T. Kim, F.N.C. Wong Single-photon two-qubit SWAP gate for entanglement manipulation Phys. Rev. A 72, 012318 (2005)
[6] M.S. Kang, J. Heo, S.G. Choi et al. Implementation of SWAP test for two unknown states in photons via cross-Kerr nonlinearities under decoherence effect. Sci Rep 9, 6167 (2019)
[7] J. Zhao, Y.-H. Zhang, C.-P. Shao, Y.-C. Wu, G.-C. Guo, G.-P. Guo Building quantum neural networks based on a swap test Phys. Rev. A 100, 012334 (2019)
[8] P. Li, B. Wang Quantum neural networks model based on swap test and phase estimation Neural Networks 130 (2020) 152-164
[9] K. Hornik, M. Stinchcombe, H. White Multilayer feedforward networks are universal approximators Neural Networks Volume 2, Issue 5, 359-366 (1989)
[10] G. Cybenko Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst., 2(4), 303-314, 1989.
[11] A. Pinkus Approximation theory of the MLP model in neural networks. Acta Numer., 8, 143-195, 1999.
[12] R. Livni, S. Shalev-Shwartz, O. Shamir On the computational efficiency of training neural networks. In Adv. Neural Inf. Proc. Sys. (NIPS), pp. 855-863, 2014.
[13] S.S. Du, J.D. Lee On the power of over-parametrization in neural networks with quadratic activation. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018
[14] M. Soltani, C. Hedge Towards provable learning of polynomial neural networks using low-rank matrix estimation. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spain. PMLR: Volume 84.
[15] Z. Tan, S. Chen On the learning dynamics of two-layer quadratic neural networks for understanding deep learning. Front. Comput. Sci., 2022, 16(3): 163313
[16] C. Luo, J. Zhan, X. Xue, L. Wang, R. Ren, Q. Yang Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks In: Artificial Neural Networks and Machine Learning - ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11139. Springer, Cham
[17] A. Baldazzi, N. Leone, M. Sanna, S. Azzini, L. Pavesi A linear photonic swap test circuit for quantum kernel estimation Quantum Sci. Technol. 9 045053 (2024)
[18] T. Ono, R. Okamoto, M. Tanida, H.F. Hofmann, S. Takeuchi Implementation of a quantum controlled-SWAP gate with photonic circuits. Sci. Rep. 7, 45353 (2017)
[19] J.C. Garcia-Escartin, P. Chamorro-Posada, Swap test and Hong-Ou-Mandel effect are equivalent Phys. Rev. A 87, 052330 (2013)
[20] L. Dong et al. Nearly deterministic Fredkin gate based on weak cross-Kerr nonlinearities J. Opt. Soc. Am. B 33, 253 (2016)
[21] B.C. Ren, A.H. Wang, A. Alsaedi, T. Hayat, F.G. Deng, Three-photon polarization-spatial hyperparallel quantum Fredkin gate assisted by diamond nitrogen vacancy center in optical cavity Ann. Phys. 530, 1800043 (2018).

Xet Storage Details

Size:: 80.8 kB
Xet hash:: 69bd33f08c975676e6cabac89b5689e168bde68bda3baa6b36407ca3e3fc82eb

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.