Paper Content:
Page 1:
Data-Dependent Generalization Bounds for
Parameterized Quantum Models Under Noise
Bikram Khanal and Pablo Rivas
School of Engineering & Computer Science, Baylor University, One
Bear Place, Waco, 76798, TX, USA.
*Corresponding author(s). E-mail(s): pablo rivas@baylor.edu;
Contributing authors: bikram khanal1@baylor.edu;
Abstract
Quantum machine learning offers a transformative approach to solving com-
plex problems, but the inherent noise hinders its practical implementation in
near-term quantum devices. This obstacle makes it difficult to understand the
generalizability of quantum circuit models. Designing robust quantum machine
learning models under noise requires a principled understanding of complexity
and generalization, extending beyond classical capacity measures. This study
investigates the generalization properties of parameterized quantum machine
learning models under the influence of noise. We present a data-dependent gener-
alization bound grounded in the quantum Fisher information matrix. We leverage
statistical learning theory to relate the parameter space volumes and training
sizes to estimate the generalization capability of the trained model. We provide
a structured characterization of complexity in quantum models by integrating
local parameter neighborhoods and effective dimensions defined through quan-
tum Fisher information matrix eigenvalues. We also analyze the tightness of the
bound and discuss the tradeoff between model expressiveness and generalization
performance.
Keywords: Quantum Machine Learning, Generalization Bound, Noise Channel, NISQ
1 Introduction
Quantum machine learning (QML) holds the promise of revolutionizing data analy-
sis by leveraging quantum mechanical principles to enhance computational efficiency
and model expressivity [1–3]. Although still in its early stages [4–6], QML has shown
1arXiv:2412.11451v3 [cs.LG] 4 Feb 2025
Page 2:
potential advantages over classical machine learning in certain tasks, such as classifica-
tion [7–9], regression [10], and clustering [11]. A crucial aspect of any machine learning
model, including QML models, is its generalization ability. Generalization refers to a
model’s ability to perform well on unseen data [12–14]. Generalization bounds provide
theoretical guarantees on the performance of a model in new data and play a vital
role in ensuring the reliability and robustness of machine learning models [15–17].
One of the prominent questions in quantum machine learning is understanding
how noise affects the generalizability of models [18–20]. Generalization bounds are
typically derived by bounding a complexity measure of the hypothesis class used for
learning [20]; such measures—like Rademacher complexity, VC dimension, and metric
entropy—quantify the expressiveness or capacity of the model class. These bounds are
especially crucial in the NISQ era [6, 19–23], where devices are characterized by limited
qubit coherence times, suboptimal gate fidelities, and a high susceptibility to noise,
all of which can significantly impact the performance and reliability of QML models.
Several studies have investigated QML generalization bounds—exploring approaches
such as uniform bounds, margin-based bounds, and stability analysis [6, 16, 18–20, 22–
29]. Recent works by Bu et al.[29] have further refined this framework by rigorously
linking the complexity of QML models to their quantum resource content through tai-
lored Rademacher and Gaussian analyses. In particular, they introduce a ( p, q)-norm
to quantify the non-stabilizer (or “magic”) resource, demonstrating that the incor-
poration of non-Clifford gates—characterized via free robustness measures—directly
influences the generalization capacity of quantum circuits[28]. These contributions
offer valuable insights into how circuit architecture, data encoding strategies, and
optimization choices affect QML generalization. Concurrently, research has also exam-
ined the broader impact of noise on QML [4, 30–32], emphasizing that noise from
sources such as decoherence [33], gate errors, and other quantum phenomena poses
formidable challenges to developing models that can reliably learn and generalize.
Although several strategies for mitigating noise have been proposed [34, 35, 35, 36],
much of the existing research on QML generalization bounds focuses on idealized,
noise-free quantum computers. Moreover, while the generalization capabilities of quan-
tum neural networks and quantum kernel methods have been explored [37–39], these
bounds are often tailored to specific algorithmic techniques or problems. Some stud-
ies [4, 31, 32, 39–41] further suggest that quantum noisy channels can significantly
hinder QML capabilities. We note that the generalization abilities of quantum neural
network models may also be compromised by the barren plateau phenomenon [31, 42–
47] and that quantum kernel methods can suffer from exponential concentration [40]. It
should be noted that training-induced noise differs from the inherent noise in quantum
devices and is not considered in this work. Consequently, a comprehensive under-
standing of how noise affects the generalization performance of QML models remains
lacking—a gap that is particularly critical in the NISQ era, where noise is unavoidable
and can severely limit the practical applicability of QML algorithms.
This work addresses this gap by investigating the generalization bound for QML
under a noisy channel. Specifically, we consider a scenario where a supervised QML
model is trained on a noisy quantum system, and the goal is to establish a bound
2
Page 3:
on its generalization error in the presence of noise. We achieve this goal by leverag-
ing the quantum Fisher information-derived metrics, parameter space volume, and
training sample size into a quantifiable measure of complexity in noisy QML models
and derive the generalization bound for near-term QML models. Our approach inter-
prets complexity as a controllable quantity rather than an abstract or unbounded
factor. Specifically, we consider depolarization noise channels for an experiment to
investigate how this noise affects both the theoretical generalization bound and the
empirical performance of the QML models. The work refines the global bound by
considering local parameter neighborhoods, effective dimensions defined through quan-
tum Fisher information matrix eigenvalues, and conditions that stabilize complexity
growth under noise. The statistical results verify that the proposed local refinements
for the depolarization noise channel provide a more accurate and efficient analysis of
the generalization performance. Consequently, our work bridges abstract theoretical
concepts and the operational needs of quantum learning architectures. The contribu-
tion lies not in proposing new computational primitives but in producing a rigorous
theoretical structure that uses geometric principles to inform model design. Rather
than treating all parameters equally, our approach clarifies that parameter directions
differ in their contribution to predictive performance, making geometric considerations
not just aesthetic choices but practical necessities.
The remainder of this paper proceeds as follows. The next section presents a pre-
liminary background in machine learning and the quantum machine with the formal
setup of binary classification learning problems. We briefly discuss classical and quan-
tum Fisher information in section 3, and the effective dimension in section 4. Section 5
proposes the generalization bound and corollaries followed by a local refinement. A
subsequent section 6 discusses empirical validation through numerical experiments,
confirming the agreement with theoretical predictions. We discuss these results in a
broader context in section 7 and summarize key contributions in section 8. Appendix 8
provides detail in the mathematical derivation of the proposed bound with the
necessary background.
2 Preliminary Definitions
We begin by discussing the fundamental concepts of learning from data and extending
them to the QML framework within a supervised learning setting. Readers already
familiar with the field may skip the following two subsections. For those seeking a
deeper understanding, we recommend the books by [13, 14] for classical learning theory
and the books by [1, 48] for QML and quantum information background.
2.1 Supervised Learning
In supervised learning, we are given the input space Xand the output space Y. Let P
be the joint probability distribution over X ×Y andD={(xi, yi)}N
i=1be a dataset of
Ni.i.d samples drawn from P. Supervised learning aims to learn a function f:X → Y
that maps input to output. Given a training dataset D, the learning algorithm A
selects the best function hfrom a hypothesis class H. Usually, the elements of Hare
parameterized by a vector of parameters θ∈Θ, where Θ ⊂Rdis a parameter space
3
Page 4:
ofd∈Ndimension. We may also refer to hashθto emphasize the dependence on the
parameters and define the hypothesis class as H={hθ:θ∈Rd}. Please note that
handfare used interchangeably as hypotheses in the context of learning, and this
might be exhibited in our manuscript, too.
Given two approximations h1, h2∈ H, how do we decide which is better? One way
to do that is to define a loss function l:Y × Y → R. The loss function measures
the difference between the predicted output ˆy∈Rand the true output y∈ Y. The
learning algorithm aims to find hthat minimizes the expected loss (or expected risk)
defined as:
R(h) =E(x,y)∼P[l(y, h(x))]. (1)
The expected risk is the average loss over all possible inputs and outputs. Therefore,
the goal is to find a function hsuch that l(h) is as small as possible. In practice, we
cannot access the true distribution P; hence, it is often challenging to compute the
expected risk. Instead, we compute the empirical risk and try to minimize it, also
referred to as the empirical risk minimization (ERM) principle. The empirical risk is
defined as:
ˆR(h) =1
NNX
i=1l(yi, h(xi)). (2)
A central goal of machine learning is to understand how well the empirical risk
approximates the expected risk [12–14, 49]. The difference between the empirical and
expected risks is the generalization error. The generalization error quantifies how well
the model generalizes from the training data to unseen data. A model with low gener-
alization error is said to generalize well, while a model with high generalization error
suggests poor generalization. Generalization error is typically of the form:
GenError = R(h)−ˆR(h). (3)
One of the important uses of generalization error is to provide a bound on the expected
risk of the model. We can use the probability to bound the generalization error as:
Pr
sup
h∈HˆR(h)−R(h)≤ϵ
≥1−δ, (4)
where Pr[ ·] denotes the probability measure, supis the supremum, ϵis the error toler-
ance, and δis the confidence level. The uniform convergence bound given by Eq. (4)
implies that with probability at least 1 −δ, the empirical risk of the best hypothesis in
the hypothesis class is close to the expected risk. We can use the uniform convergence
bound approach to provide a bound on the generalization error, i.e., generalization
bound. The generalization bound guarantees how well the model generalizes to unseen
data [50, 51]. Generalization bound is typically of the form:
R(h)≤ˆR(h) + complexity term , (5)
where the complexity term is often a function of the HandN. Complexity measure
depends on PoverX ×Y , and if Pis easy to learn, the complexity term will be small.
4
Page 5:
In other words, if we are to find the model with the low complexity term, we can
expect the model to generalize well [52, 53].
Alternatively, instead of getting a bound on the probability of the generalization
error, we can obtain an upper bound in generalization error expectations instead. i.e.,
we can bound the expected generalization error as:
ED
sup
h∈HˆR(h)−R(h)
≤complexity term , (6)
Where the expectation is taken over the training dataset D. In this work, we focus on
the expected generalization error bound and provide a bound on the expected gen-
eralization error for quantum machine learning models in noisy environments. Before
proceeding, we briefly discuss the quantum machine learning framework.
2.2 Quantum Machine Learning
Unlike classical ML, which relies on deterministic computations, QML leverages the
inherently probabilistic nature of quantum mechanics, enabling it to explore multiple
states simultaneously and potentially solve problems that are intractable classically.
This property allows QML algorithms to operate in high-dimensional Hilbert spaces
with notable efficiency [7, 54]. The central idea in QML for the near-term era is to
encode classical data into quantum states, process them through quantum circuits, and
extract predictions through measurements. The quantum circuit is typically parame-
terized by a set of angles, which are optimized to minimize a chosen cost function. The
optimization process is often performed using gradient-based methods, where the gra-
dients are computed using the parameter-shift rule [55, 56]. Like classical ML models,
the quantum circuit model is trained to minimize the empirical risk. The key differ-
ence is that the quantum circuit model operates on quantum states and measurements,
which can lead to quantum advantages in certain tasks [7, 8]. The generalization prop-
erties of QML models are particularly interesting, as they can provide insights into
the model’s performance on unseen data.
We consider a binary classification setting with a classical input space X ⊂Rm
and a label space Y={−1,+1}. LetHsbe a Hilbert space associated with an n-qubit
quantum system, so dim( Hs) = 2n. A central goal is to encode each classical data
point x∈ X into a quantum state and process it through a parameterized quantum
circuit (PQC) to obtain measurement outcomes that can be interpreted as predictions.
2.2.1 Data Encoding
A key step in quantum machine learning is encoding the classical data xinto a quantum
state. Define a feature map ϕ:X → H sthat maps each x∈ X to a normalized
quantum state |ϕ(x)⟩ ∈ H s. This encoding can be achieved in various ways, depending
on the hardware constraints and the nature of the data [2, 54]. Ref. [57] is an excellent
resource for understanding data encoding strategies in quantum machine learning. A
common strategy is to apply parameterized single-qubit rotations whose angles depend
5
Page 6:
on the components of x:
Rx(xi) = exp( −ixiσx/2), R z(xi) = exp( −ixiσz/2), R y(xi) = exp( −ixiσy/2),
where σx, σyandσzare Pauli matrices. These operations can be arranged so that the
mfeatures of xare distributed across nϕ≤nqubits, resulting in the encoded state
|ϕ(x)⟩defined as:
|ψ(x)⟩=|ϕ(x)⟩ ⊗ | 0⟩⊗(n−nϕ). (7)
2.2.2 Parameterized Quantum Circuit
Following data encoding, we apply a PQC with trainable parameters
θ=
θ1, θ2, . . . , θ L
.
Each θkcan represent the rotation angles in single or multi-qubit gates. Mathe-
matically, the PQC is a unitary U(θ) that is typically decomposed into layers of
parameterized gates and entangling (non-parameterized) gates:
U(θ) =LY
k=1Uk(θk),
Where Uk(θk) often acts on one or more qubits. Acting on the initial state |ψ(x)⟩, the
PQC produces
|ψθ(x)⟩=U(θ)|ψ(x)⟩, (8)
whose density matrix representation is
ρθ(x) =|ψθ(x)⟩⟨ψθ(x)|. (9)
In quantum machine learning, without loss of generalizability, the quantum neu-
ral networks [8, 58, 59] can also be considered PQC. The analogy to classical neural
networks arises because the PQC parameters θare iteratively optimized (trained) to
minimize a chosen cost function. While backpropagation is central to classical net-
works, the gradient can be computed via specialized quantum techniques, such as the
parameter-shift rule (detailed in Section 2.2.5).
2.2.3 Measurement and Classification
To extract a classical prediction, we measure a designated subset of qubits. In a simple
binary classification scenario, we measure only the first qubit with a two-outcome
measurement described by a positive operator-valued measure (POVM) {M+1, M−1}:
M+1=|0⟩⟨0| ⊗I⊗(n−1), M −1=|1⟩⟨1| ⊗I⊗(n−1).
6
Page 7:
where |0⟩=
1
0
,⟨0|=|0⟩†andIis an identity matrix. The probability of obtaining
label y∈ {+1,−1}for input xis then:
ˆy=p(y|x;θ) = Tr
Myρθ(x)
. (10)
A simple classification rule predicts y= +1 if ˆy≥0.5 and y=−1 otherwise, mirroring
classical threshold-based decision rules.
2.2.4 Training and Optimization
In supervised quantum machine learning, we define a loss function (cost function) L(θ)
that quantifies prediction error across a training set {(xi, yi)}. A typical example is
the cross-entropy loss:
L(θ) =−1
NNX
i=1h
yilog(ˆyi) + (1 −yi) log(1 −ˆy)i
. (11)
We then seek optimal parameters θ∗by minimizing L(θ) via gradient-based methods:
θ←θ−α∇θL(θ), (12)
Where αis a learning rate. Since the entire computation of p(y|x;θ) is performed by
a quantum circuit, we need a way to obtain ∇θL(θ) from measurement statistics.
2.2.5 Parameter-Shift Rule for Gradient Computation
A popular approach for gradient computation in near-term quantum devices is the
parameter-shift rule [55, 56, 60]. Consider a circuit with a parameterized gate of the
form:
U(θj) = exp
−i θjG/2
,
where G2=I(common for single-qubit Pauli rotations). For an observable M
measured at the end of the circuit, the parameter-shift rule states that:
∂
∂θj⟨M⟩(θ) =1
2h
⟨M⟩
θj+π
2
− ⟨M⟩
θj−π
2i
, (13)
where ⟨M⟩(θ) = Tr[ M ρθ(x)] is the expectation value of M(or an effective estimator
forp(y|x;θ)). This formula allows one to compute partial derivatives by running
the circuit twice, at shifted parameter values θj±π
2. In practice, one repeats these
measurements multiple times to estimate the required probabilities, then aggregates
these partial derivatives across all parameters to form ∇θL(θ).
2.2.6 Practical Considerations and Noise
Current quantum hardware (NISQ devices) are subject to noise and decoherence,
which can degrade performance. In the presence of a noisy quantum channel N, the
7
Page 8:
noiseless density operator ρθ(x) transforms into
˜ρθ(x) =N
ρθ(x)
. (14)
As a result, the measured probabilities become
˜p(y|x;θ) = Tr
My˜ρθ(x)
= Trh
MyN(ρθ(x))i
. (15)
For certain noise channels and measurement bases, this noisy probability ˜p(y|x;θ)
may be well-approximated by a differentiable function η: [0,1]→(0,1] of the noiseless
probability η
p(y|x;θ)
as:
˜p(y|x;θ) =η(p(y|x;θ)), (16)
with ηcontinuous in the noise rate and η(0) = 1 [61, 62]. This simplification holds, for
example, when the noise operators commute with the measurement operators, as is the
case with a depolarizing channel that acts on states measured on the computational
basis [48, 63–65].
Noise modifies the sensitivity of the output probability distribution to changes in
θ, influencing both trainability (e.g., risk of barren plateaus [8, 46]) and generaliza-
tion [66]. Despite challenges related to data encoding overhead, hardware noise, and
the limitations of near-term devices, PQC-based models remain promising candidates
for quantum advantage in specific domains [59, 67, 68]. Because noise modifies the
sensitivity of output probabilities with respect to variations in θ, analyzing ˜p(y|x;θ)
under realistic noise channels is crucial for understanding trainability and general-
ization. This sensitivity can be quantified through the Fisher information matrices,
both classical and quantum, which serve as fundamental tools for analyzing parameter
distinguishability and generalization in the presence of noise [66, 69–71].
3 Fisher Information
From a machine learning perspective, the Fisher information measures the sensitivity
of the output label distribution with respect to changes in model parameters [72–75].
Higher Fisher information indicates that the parameters θcarry more information
about the output label y, enabling more precise parameter estimation and potentially
improving learnability. In a quantum machine learning setting, we must consider clas-
sical and quantum notions of Fisher information to capture the role of quantum states
and measurements fully.
3.1 Classical Fisher Information
Consider a parameterized probability distribution p(y|x;θ), where y∈ Ydenotes the
label and θ= (θ1, . . . , θ d) are the model parameters. The classical Fisher information
8
Page 9:
matrix (FIM) is defined as:
F(θ)ij=X
y∈Yp(y|x;θ)∂lnp(y|x;θ)
∂θi∂lnp(y|x;θ)
∂θj
. (17)
Equivalently, this can be written as:
F(θ)ij=X
y∈Y1
p(y|x;θ)∂p(y|x;θ)
∂θi∂p(y|x;θ)
∂θj
. (18)
The classical FIM quantifies how sensitive the observed label distribution is to small
perturbations of the parameters. PQC measurement outcomes form the probability
distribution. However, in the near-term there is the exponential growth of potential
measurement outcomes as the number of quantum bits increases [76]. This means
many measurement outcomes might never appear, although with minimal probability,
causing instability in classical Fisher information matrix computation [71, 73].
3.1.1 Fisher Information Under a Noisy Channel
When a noisy quantum channel Nacts on the underlying quantum state, the probabil-
ity distribution p(y|x;θ) is replaced by ˜p(y|x;θ) defined in Eq. (15). The corresponding
noisy classical FIM becomes:
˜F(θ)ij=X
y∈Y1
˜p(y|x;θ)∂˜p(y|x;θ)
∂θi∂˜p(y|x;θ)
∂θj
. (19)
Using the relationship in Eq.(16) and the chain rule, we have:
∂˜p(y|x;θ)
∂θi=η′(p(y|x;θ))∂p(y|x;θ)
∂θi, (20)
where ηandη′characterize the noise-induced transformation of the probabilities.
Since ηis differentiable, η′can be approximated by the derivative of ηatp(y|x;θ).
The exact form of ηdepends on the noise model and the measurement basis, but it
is often chosen to be a smooth function that preserves the ordering of probabilities.
Now, substituting Eq.(20) into Eq. (19), we get:
˜F(θ)ij=X
y∈Y[η′(p(y|x;θ))]2
η(p(y|x;θ))∂p(y|x;θ)
∂θi∂p(y|x;θ)
∂θj
. (21)
This expression shows that the noisy classical FIM is the noiseless FIM scaled by
factors depending on ηandη′. Ifη′≈1, noise minimally affects gradients. If η′>
1, the model is more sensitive to parameter changes, possibly improving short-term
learnability but risking overfitting. If η′<1, the parameter sensitivity is suppressed,
9
Page 10:
potentially resulting in slower training and higher bias. These dynamics reflect the
interplay between noise and trainability in quantum machine learning models.
The classical FIM considered so far treats measurement outcomes as classical data.
While this is sufficient for fixed measurements, it does not exploit the full quantum
nature of the underlying states. To fully characterize parameter sensitivity at the
quantum level, one must turn to the quantum Fisher information, which provides
a more fundamental and measurement-independent measure of distinguishability of
quantum states [77–79].
3.2 Quantum Fisher Information
The quantum Fisher information (QFI) [71, 73, 78, 80] generalizes the concept of
Fisher information to the quantum domain. Instead of fixing a particular measure-
ment, the QFI considers the most informative measurement possible. This makes the
QFI a fundamental limit on parameter precision achievable with an optimal choice of
measurement.
3.2.1 Ideal (Pure) States
For a single-parameter family of pure states |ψ(θ)⟩, the QFI is given by [81, 82]:
FQ(θ) = 4
⟨∂θψ(θ)|∂θψ(θ)⟩ − |⟨ ψ(θ)|∂θψ(θ)⟩|2
. (22)
If|ψ(θ)⟩=e−iGθ|ψ(0)⟩for some Hermitian generator G, this simplifies to:
FQ(θ) = 4⟨ψ(0)|(∆G)2|ψ(0)⟩, (23)
where (∆ G)2=G2− ⟨G⟩2.
3.2.2 Noisy (Mixed) States
For a mixed state ρ(θ), the QFI is defined using the symmetric logarithmic derivative
(SLD) Lθ, which satisfies:
∂ρ(θ)
∂θ=1
2(Lθρ(θ) +ρ(θ)Lθ). (24)
The QFI is then:
FQ(θ) = Tr[ ρ(θ)L2
θ]. (25)
Ifρ(θ) =P
iλi(θ)|ψi(θ)⟩⟨ψi(θ)|is the spectral decomposition, following [82] one
can write:
Lθ=X
i,j2
λi(θ) +λj(θ)⟨ψi(θ)|∂θρ(θ)|ψj(θ)⟩|ψi(θ)⟩⟨ψj(θ)|. (26)
For multiple parameters θ= (θ1, . . . , θ d), the QFI generalizes to a matrix known as
the quantum Fisher information matrix (QFIM). The QFIM’s elements can be derived
10
Page 11:
from the SLDs corresponding to each parameter, and it encodes the distinguishability
ofρ(θ) in all parameter directions. This multi-parameter QFIM reduces to the single-
parameter QFI if we focus on one parameter at a time.
3.3 Relationship between Classical and Quantum Fisher
Information
Consider a measurement described by a POVM {Mx}, where the probability of obtain-
ing outcome x is p(x|θ) = Tr[ ρ(θ)Mx]. The corresponding classical Fisher information
(CFI) is:
FC(θ) =X
xp(x|θ)∂
∂θlnp(x|θ)2
. (27)
The Quantum Cram´ er-Rao bound states:
Var(ˆθ)≥1
MFQ(θ), (28)
where ˆθis an unbiased estimator, and Mis the number of experiments. Since the
QFI represents the maximal information obtainable over all possible measurements,
we have:
FC(θ)≤FQ(θ), (29)
with equality when the chosen POVM attains the quantum limit.
This relationship shows that the QFI provides a fundamental upper bound on the
precision achievable by any measurement scheme, making it a key tool in quantum
parameter estimation [71]. Classical Fisher information depends on the chosen mea-
surement, whereas quantum Fisher information is measurement-independent and sets
the ultimate precision limit.
Classical and quantum Fisher information concepts and their interplay form the
foundation for defining complexity measures such as the effective dimension of a
parameter space. The following section introduces the effective dimension and its role
in understanding generalization in quantum machine learning models.
4 Effective Dimension
The notion of effective dimension [57, 66, 83] offers a way to measure the intrinsic com-
plexity of a parameterized quantum model beyond the raw parameter count. While
dis the total number of parameters, many directions in this d-dimensional parame-
ter space may contribute minimally to changes in the model’s outputs. The effective
dimension systematically determines which parameter directions substantially alter
observable outcomes and which do not. This perspective is particularly beneficial for
quantum models, where certain parameter variations may be ”inert” due to noise,
hardware constraints, or limited sensitivity of the state to those parameters.
Mathematically, a large parameter space can still exhibit an effectively smaller
”active” subspace if specific parameters fail to induce meaningful changes in mea-
surement statistics. Informally, one may imagine each parameter as contributing a
11
Page 12:
”direction” along which the quantum state could change. If moving along some direc-
tions hardly affects the measurable quantities, the system effectively behaves as if those
parameters do not exist. In practice, this phenomenon leads to an effective dimension
deff≤d. Such a reduction in the dimension of relevant variability can mitigate overfit-
ting in machine-learning contexts but also limit representational capacity if too many
directions are inactive.
A central tool for quantifying parameter influence is the quantum Fisher informa-
tion matrix. For a pure state |ψ(θ)⟩, the QFIM is defined by,
Fij(θ) = 4Re [⟨∂iψ(θ)|∂jψ(θ)⟩ − ⟨∂iψ(θ)|ψ(θ)⟩⟨ψ(θ)|∂jψ(θ)⟩], (30)
where ∂i|ψ(θ)⟩denotes the partial derivative of |ψ(θ)⟩with respect to the i-th param-
eter. Analogous formulations exist for mixed states (detailed in Appendix 3.2.2, see
also [71]). The QFIM captures how sensitively the state responds to small varia-
tions in each parameter. Directions in parameter space that do not appreciably shift
the state’s geometry yield small or negligible eigenvalues in the QFIM, effectively
indicating ”inactive” parameters.
One way to formalize this idea is via the rank of the QFIM:
deff= max
θ∈Θrank
F(θ)
. (31)
If rank
F(θ)
is less than d, it signifies that some parameters do not induce linearly
independent changes in the quantum state or its measurement outcomes, leading to
deff< d. Consequently, the manifold of states effectively spans fewer directions than
the raw parameter count might suggest.
Rather than only checking the rank, one may study the QFIM’s eigenvalues
{λi}d
i=1. A common measure is:
deff≈Pd
i=1λi2
Pd
i=1λ2
i, (32)
an expression closely related to the inverse participation ratio. Here, if the QFIM spec-
trum is sharply concentrated in a handful of large eigenvalues, deffremains well below
d. Conversely, if many eigenvalues are comparable, the system effectively leverages a
broader portion of parameter space.
Effective dimension has multifaceted importance in QML. For learning models,
a reduced deffcan provide greater robustness against overfitting since only a subset
of parameters drives significant changes in the loss landscape, effectively limiting the
complexity of the model class. A high deffimplies that multiple parameters can be
measured with high precision simultaneously. In contrast, a low deffsuggests that
sensitivity is concentrated along only a few principal directions. In both settings,
awareness of the effective dimension can guide model selection, training strategies, or
experimental design by highlighting how many ”truly operative” degrees of freedom
the quantum system has.
12
Page 13:
Thus, the effective dimension concept refines the naive view of dimensionality by
honing in on the directions in which the model or system genuinely exhibits variability.
By treating parameters that do not measurably alter outcomes as effectively inactive,
deffoffers a more precise view of capacity and complexity in quantum models. Com-
bined with the QFIM framework, this facilitates theoretical analyses of generalization,
performance, and practical insights into designing and training near-term quantum
architectures for learning.
5 Generalization bound
Fig. 1 : Rademacher Complexity f(d, n) as a function of the number of samples nand
the dimensionality d. We have applied the log scaling on both axes for the third figure
(second row) for better visualization.
In this section, we describe the main contribution of our work by providing a
generalization bound for quantum machine learning models in noisy environments.
Building upon statistical learning theory [12, 84] and quantum information theory [48],
we prove our main result in Appendix [B.2]:
Theorem 5.1. [Generalization Bound for Parameterized Quantum Models] Let
d, N∈Nandδ∈[0,1). Consider a d-dimensional parameter space Θ⊂Rda
class of quantum machine learning model functions FΘ={fθ,p:θ∈Θ}where
each model output under a noisy channel with noise parameter p∈[0,1) is given
13
Page 14:
d k(d) NThird Term ,δ= 0.005
1 2.72 8 0.99
10 3.49 13 0.60
100 10.10 102 0.07
1000 31.65 1002 0.01
10000 100.01 10002 0.00
50000 223.61 50002 0.00
100000 316.23 100002 0.00
Table 1 : Estimated generalization error scaling for increasing model dimensional-
ity. Each column shows the parameter dimension d, a complexity measure k(d), the
approximate sample size Nrequired to reduce complexity-driven overhead, and the
resulting complexity term of the bound. Higher dimensionality increases complexity,
yet the growth is modest; with sufficient samples, even very high-dimensional models
can achieve negligible complexity contributions.
by:fθ,p(x) =η(p)fθ(x), with η: [0,1)→(0,∞)a perturbation function satisfying
η(0) = 1 . Assume the following:
•Loss Function : The loss l :Y ×R→[0,1]is Lipschitz continuous in its second
argument with Lipschitz constant L >0.
•Perturbed model bounded gradient : The gradient of the perturbed model fθ,p(x)
w.r.t θis bounded by the Lipschitz constant Lp
f:||∇θfθ,p(x)|| ≤Lp
f,∀x∈ X, θ∈Θ.
•FIM Lower Bound : LetF(θ)denote the quantum Fisher Information Matrix
associated with the model. Suppose there exists m > 0such that:
p
det(F(θ))≥m > 0,∀θ∈Θ.
LetVΘbe the volume of the parameter space Θand define:
C′= log VΘ−logVd−logm+dlogLp
f,
where Vd=πd
2
Γ(d
2+1)is the volume of a unit ball in Rd, and Γ(·)is the gamma function.
Then, for any δ >0, with probability at least 1−δover the random draw of an i.i.d.
training set D={(xi, yi)}N
i=1∼P, the following generalization bound holds uniformly
for all θ∈Θ:
R(θ)≤ˆRN(θ) +12√
πdexp(C′
d)√
N+ 3r
log(2/δ)
2N. (33)
where R(θ) = E(x,y)∼P[l(y, fθ,p(x))]is the expected risk and ˆRN(θ) =
1
NNP
i=1l(yi, fθ,p(xi))is the empirical risk.
Theorem 5.1 provides a data-dependent generalization bound for parameterized
quantum models under noise. It reveals how sample size N, parameter dimension d,
14
Page 15:
and model complexity interplay. For large N, the third term ∼q
log(2/δ)
Nbecomes neg-
ligible. Consequently, the dominant complexity term12√
πdexp(C′/d)√
Nand the empirical
riskˆRN(θ) determine the model’s capacity to generalize.
Asdgrows large, exp( C′/d)→1, so the complexity scales approximately as√
d.
Intuitively, having more parameters can increase complexity, yet the1√
Nfactor ensures
that as N grows, the model complexity effectively diminishes, improving generalization.
Nevertheless, increasing the dimension dcan make generalization more challenging if
the dataset is small.
This bound also emphasizes that, for any fixed d, the complexity term decreases
asNincreases, suggesting that, in the limit N→ ∞ , the model’s generalization error
can be made arbitrarily small. However, as shown in Fig. 1 for any finite N, there
is always a non-zero complexity term. Thus, even though we may achieve empirical
risk close to zero, the interplay of dimension and sample size ensures that perfect
generalization is not guaranteed.
To gain further insight, consider:
k(d) =√
dexpC′
d
.
For large d, k(d) grows slower than linearly but increases with d. Roughly, one requires
on the order of k(d)2data points to push the complexity-driven error below a thresh-
old of 12√π. From table 1, it is apparent that by appropriately increasing the number
of training samples N, one can significantly push down the complexity term, mak-
ing it nearly minimal for large data sets. The practical challenge of generalizing with
high-dimensional parameter spaces can be tackled by ensuring adequate sample sizes.
This aligns with the well-established notion in machine learning that more data leads
towards better generalization. While high-dimensional models might start more com-
plex, the capacity to temper that complexity with larger datasets allows such models
to remain viable, reinforcing that sufficient data can offset the dimensional burden.
The global bound in Theorem 5.1 considers the worst-case scenario over the entire
parameter space Θ. In practice, the learned parameter ˆθmay reside in a much smaller,
effectively low-complexity region after training converges. This motivates local bounds
that yield tighter guarantees.
Corollary 5.2. [Local Generalization Bound] Let the conditions of Theorem 5.1 hold.
Suppose after training, ˆθis a solution, and consider a local neighborhood Θloc⊂Θ
around ˆθasΘloc:={θ∈Θ :∥θ−ˆθ∥ ≤δ}where δ >0but up to global limits. Assume
within Θloc:
•The Fisher Information Matrix remains well-conditioned:p
det(F(θ))≥mloc>0.
•The gradients of the perturbed model are bounded locally by Lp
floc.
15
Page 16:
Define: Cloc= log( VΘloc)−log(Vd)−log(mloc)+dlog(Lp
f,loc), where VΘlocis the volume
of the local region. Then, with probability at least 1−δ, for all θ∈Θloc:
R(θ)≤ˆRN(θ) +12√
πd,exp(Cloc/d)√
N+ 3r
log(2/δ)
2N. (34)
Proof. This follows from Theorem 5.1 by restricting the parameter space to the
smaller, better-conditioned subset Θ loc. Replacing global parameters mandLp
fwith
their local counterparts mlocandLp
floc, and using VΘlocin place of VΘ, leads to the
stated local bound. The same Rademacher complexity and covering number arguments
apply over a reduced search space, potentially tightening the bound.
Corollary 5.2 shows that if training converges to a region where parameters vary
less and the Fisher information remains stable, the complexity term decreases. This
improved local conditioning can yield sharper generalization guarantees.
Corollary 5.3. [Effective Dimension and QFIM] Let F(θ)denote the QFIM and
assume:p
det(F(θ))≥m > 0,∀θ∈Θ. Suppose λ1(θ)≥ ··· ≥ λd(θ)>0are the
eigenvalues of F(θ). For a threshold α >0, define the effective dimension: deff(α) =
max{r≤d:λr(θ)≥α,∀θ∈Θ}. Then, applying the reasoning of Corollary 5.2 to an
effective dimension setting, we obtain with probability at least 1−δ:
R(θ)≤ˆRN(θ) +12p
πdeff(α) exp( Cloc/deff(α))√
N+ 3r
log(2/δ)
2N. (35)
Proof. By focusing on the subspace where the QFIM eigenvalues exceed α, we effec-
tively reduce the dimension to deff(α). The argument parallels that of Corollary 5.2,
simply substituting the effective dimension deff(α) and corresponding constants into
the main theorem’s complexity estimates.
Corollary 5.3 underscores that not all parameters are equally relevant for gener-
alization. The effective dimension deff(α) filters out parameter directions that do not
significantly affect the state’s distinguishability. This refinement can yield meaning-
ful, dimension-reduced generalization bounds, bridging the gap between theoretical
complexity measures and practical model performance.
Overall, Theorem 5.1 and its corollaries highlight how the interplay of sample size
N, parameter dimension d, and the geometry of the QFIM govern the generalization
capabilities of quantum machine learning models. The effective dimension perspective
further refines these insights, guiding the design and training of models that achieve
favorable tradeoffs between expressiveness and generalization.
6 Numerical Analysis
Next, we discuss the numerical analysis to validate the proposed theoretical general-
ization bound. Our primary goal is to investigate how the global and local bounds
behave under different noise levels, dataset complexities, and sample sizes. To that
16
Page 17:
end, we designed experiments using a quantum neural network with two qubits, evalu-
ated under depolarizing noise with rates p∈ {0.05,0.1,0.5}. By varying p, we examine
mild, moderate, and strong noise conditions, thus assessing the robustness of the
bounds across a spectrum of realistic NISQ-era scenarios. These experiments were per-
formed on two datasets of different complexities: (1) the Iris dataset, restricted to the
first two classes for simplicity, and (2) the Digits (MNIST) dataset, focusing on digits
{0,1}with features reduced to 8 dimensions via principal component analysis. Ref. [6]
informs that these datasets are real-world dataset representatives of quantum machine
learning tasks for generalization analysis. These datasets are particularly relevant for
the NISQ era as the Iris dataset is small and simple for low-dimensional analysis, while
the MNIST dataset has higher inherent complexity but can be reduced to a manage-
able size for near-term quantum devices. We repeated each experiment three times
to assess the variability of results across multiple runs. This approach allowed us to
examine the bound’s tightness, consistency, and applicability under noise conditions
relevant to the NISQ era.
6.1 Methodology
RX(x1)··· Rx(xm−1)Noise
(p)Rot
(θi)•Rot
(θk)•Noise
(p)M
RY(x2)··· Ry(xm)Noise
(p)Rot
(θj)•Rot
(θl)•Noise
(p)
Fig. 2 : Two-qubit circuit with depolarizing noise channels for noise rate p∈
{0.05,0.1,0.5}, parameterized single-qubit rotations, and controlled operations, pre-
liminary CNOT gate. The Rot ( ·) is a single-qubit rotation gate with three Euler
angles. θi, θj, theta kandθlare the parameters with distinct three Euler angles. The
measurement is performed on the first qubit on a computation basis.
We employ a 2-qubit parameterized quantum circuit illustrated in Fig. 2 to model
the quantum machine learning task. While the same circuit structure can be applied
to both datasets, more complex datasets might eventually require deeper or more
sophisticated circuits for optimum performance. For a circuit with nlayers layers and
nqubits qubits, d=nlayers·nqubits·c, where cis a constant reflecting the number
of parameters introduced per qubit per layer. For example, with nqubits = 2 and
nlayers = 2, and assuming c= 3 (e.g., for parameterized single-qubit rotations), we
have d= 12. To ensure uniform bounds and consistency, we define the global parameter
space as Θ = [ −2π,2π]d, resulting in a global volume of (4 π)d.
The model architecture consists of an embedding layer and a variational quantum
circuit. Classical inputs x∈Rmare mapped into quantum states via parameter-
ized single-qubit rotations (e.g., Rx(xi) orRy(xi)). Noise is incorporated by inserting
depolarizing channels with noise rate p∈[0,1). The model then becomes fθ,p(x) =
η(p)fθ(x), where η(p) scales the expectation value due to noise. This noise has a
17
Page 18:
smoothing effect on the loss landscape, reducing gradient magnitudes and enhancing
Fisher Information stability locally.
We train the model by minimizing mean square error loss using a natural gradient
descent method. Gradient estimates are obtained using the parameter-shift rule. We
trained the model for 20 epochs and a fixed number of runs ( nruns= 3) to average
the random initialization effects. After training, we defined a local neighborhood Θ loc
around ˆθto ensure the stability of the quantum FIM. We choose a hypercube for
computational simplicity, defined as {θ:∥θ−ˆθ∥∞≤δ}, with a local volume of (2 δ)d.
The radius δis determined by a continuity-based procedure, ensuring that the minimal
eigenvalue or determinant-based quantity of the quantum FIM within Θ locremains
above a fraction (e.g., α= 0.5) of its value at ˆθ. This ensures a stable neighborhood
size and minimal conditioning related to the quantum FIM. The value of δis also
constrained to be within the global limits, for example, α < δ ≤2π.
Finally, we compute a local Lipschitz constant Lp
f,locby sampling gradients within
Θlocand taking the maximum norm. By restricting complexity measures to these
stable neighborhoods, the derived local bound becomes tighter and more representative
of the model’s actual generalization behavior.
6.2 Result
For each dataset, we plot the generalization gap, global, and local bound across varying
sample sizes, layers, and noise rates in Fig. [3,4]. Each plot highlights the generaliza-
tion gap, global bound, and local bound as a function of training size for different
combinations of layers and noise rates. Error bars represent standard deviation over 3
runs. The local bound’s accuracy holds across different model depths, noise levels, and
datasets. The result suggests that focusing on the local geometry around the trained
parameters could get a more realistic and informative assessment of generalization
performance. The generalization gap remains low across all configurations, and we
observe that as the training size increases, the generalization gap tends to decrease or
stay close to minimal values.
The global bound starts at a high level for the Iris dataset but decreases as
the sample size increases. Nevertheless, it remains significantly above the observed
generalization gap, implying a conservative nature typical of worst-case complexity
arguments. In contrast, the local bound is consistently closer to the generalization gap,
though not perfectly aligned. This improved proximity of the local bound suggests
that refining complexity estimates to a stable neighborhood around ˆθand yields more
noteworthy upper estimates of the risk. The local region’s constraints, influenced by
stable Fisher Information conditions and locally computed Lipschitz constants, appear
to capture the effective complexity more accurately.
On the Digits dataset, the global bound again stands high above the actual gap,
but the local bound offers smaller and more stable estimates. Although not perfectly
matching the gap, the local bound’s relative closeness and a steadier downward trend
asNgrows demonstrate that local complexity assessments adapt more naturally to
increasing sample sizes. This adaptability supports the argument that local bounds
are not merely a heuristic refinement but a necessary step toward a more faithful
characterization of generalization in quantum models.
18
Page 19:
Fig. 3 : Global and local bounds versus the generalization gap for the first two classes
of the Iris dataset. The local bound more closely tracks the observed gap, indicating
a tighter and more accurate estimate of generalization.
With the Iris dataset, where dimensionality and complexity are relatively small,
global and local bounds descend as training grows, and the local bound maintains a
more moderate scale. The Digits dataset experiments, involving larger input spaces
and potentially more complex variations, show that the global bound still inflates
above the actual gap. Yet, the local bound remains stable and moves closer to capturing
the effective complexity of the learned model. This stability and reduced scale in the
local bound, observed across multiple runs and conditions (varying noise rates and
layer counts), supports our theorem 5.1 that local analysis provides a more accurate
and data-dependent complexity assessment.
Despite differences in dimension and complexity, the stable performance of the
same PQC on both Iris and Digits suggests that even a small QNN can learn binary
classification tasks with varying complexity when suitably optimized. However, deeper
or more sophisticated circuits might be needed for more complex versions of the Digits
dataset or other larger-scale problems. Additionally, increasing the depolarizing noise
ratepleads to a narrower range of parameter-space fluctuations, tightening the local
bound but also limiting model performance. This tradeoff between noise robustness
19
Page 20:
Fig. 4 : Global and local bounds compared to the observed generalization gap for
MNIST digits {0,1}. The local bound shows occasional spikes, possibly due to random
parameter initializations, but remains closer to the empirical gap than the global
bound.
and model performance is a common theme in quantum machine learning, and our
results suggest that local complexity-based bounds can help navigate this tradeoff
more effectively.
Thus, the local bound is conceptually appealing and empirically supported. The
differences between global and local complexity trends hint that complexity should not
be viewed uniformly across the entire parameter space. Instead, effective complexity is
sensitive to the region around the trained parameters. Noise, model architecture, and
available data samples interact to create a local landscape where complexity-based
bounds align more with observed performance. A key finding is that global uniform
bounds do not tightly reflect real-world performance when noise is present, creating a
larger theoretical gap. By incorporating noise effects via the effective dimension, our
analysis offers a more accurate account of how noise constrains the model’s capacity.
20
Page 21:
7 Discussion
Our approach to quantifying generalization in quantum machine learning places the
geometry of parameterized quantum states at the center of understanding how noise
and finite data affect predictive reliability. The parameter space volume, training
sample size, and the QFIM emerge as central elements governing complexity. The
QFIM, a measure of parameter distinguishability [77, 85, 86], provides insights into
how parameter variations relate to model sensitivity, which can inform stability under
realistic noise conditions. Classical learning theory often relies on measures such as
the VC dimension or the Rademacher complexity [84, 87], yet those concepts lack
a direct connection to the geometric and operator-theoretic properties of quantum
states. Incorporating the QFIM into the analysis integrates geometric and statistical
principles, producing a perspective that relates parameter scaling, data volume, and
controlled complexity growth.
Our framework applies across various parameter dimensions and noise levels, and
the complexity terms depend on volume measures and gradients within parameter
space. The local neighborhood analysis uses the QFIM to focus on parameter regions
where complexity remains stable. The effective dimension, derived from QFIM eigen-
values, provides a mechanism for identifying parameter directions that carry predictive
significance. This approach extends beyond generic notions of large parameter counts
or deep circuits, isolating meaningful subspaces that maintain manageable complexity.
Such reasoning relates to efforts linking trainability to structural properties of param-
eterized circuits [44, 46], introducing a route for shaping model architectures guided
by geometric criteria rather than expansions of parameter space.
The statistics collected from numerical experiments are consistent with the the-
oretical complexity scaling. Larger parameter dimension increases complexity in a
predictable manner related to√
d, and larger training sizes correspond to lower
complexity-driven errors. These observations connect the theoretical bound directly
to practical scenarios. The observed controlled complexity growth suggests that scal-
ing parameter dimension in quantum models can lead to predictable generalization
behavior, provided the interplay between dimension and data volume is managed. The
results on effective dimension and local parameter neighborhoods yield tools that may
influence the design of parameterized quantum circuits by directing attention toward
stable parameter directions and conditions that maintain balanced complexity.
This line of reasoning extends beyond raw measures of expressiveness, linking the
predictive reliability of trained quantum models to underlying geometric structures.
Conventional arguments about quantum models often revolve around their large state
spaces, yet without quantifying how parameter variations interact with data availabil-
ity and noise. Well-defined complexity bounds clarify that quantum models can retain
consistent behavior as problem sizes grow. Parameters no longer appear as an unstruc-
tured collection but as a network of directions differentiated by QFIM eigenvalues.
Identifying and controlling these directions through appropriate training and archi-
tectural choices may inform research on QML capacity, optimization strategies, and
representational limits [8, 88]. Complexity control is not an abstract ideal; it becomes
tangible through geometric metrics and data-driven constraints.
21
Page 22:
By connecting the QFIM to generalization properties, our analysis relates the struc-
tural features of quantum states to the statistical aspects of learning. Instead of relying
on heuristic claims or incomplete reasoning, the results rely on rigorously derived
bound and direct empirical support. The outcome is a systematic view of complexity
emergence and reduction, guiding theoretical analysis and practical implementations
of quantum learning models.
8 Conclusions
Our work utilizes geometric and statistical principles to relate quantum Fisher
information, parameter space volume, and training data to the complexity of learn-
ing in noisy quantum models. The proposed bound and numerical results suggest
that structured parameter subsets, identified through QFIM eigenvalues, provide a
means to ensure stable generalization performance without uncontrolled complexity
growth. This work provides a more nuanced understanding of complexity in quan-
tum machine learning by explicitly linking it to the geometry of quantum states and
the QFIM, complementing previous studies on expressiveness and capacity. While
previous work has explored model capacity and expressiveness in quantum machine
learning [2, 7, 10, 54, 89], this work introduces a framework that directly connects
parameter-dependent complexity to the tangible geometry of quantum states via the
QFIM.
From the perspective of the work proposed here, complexity emerges as a control-
lable quantity, motivating future research that refines parameter selection or training
protocols to preserve stability in large-scale or resource-constrained learning tasks.
Work on adaptive circuit structures, informed by local QFIM-based criteria, may help
identify directions that maintain manageable complexity while improving represen-
tational fidelity. The proposed theorem incorporates noise effects through a general
perturbation function, ensuring that performance remains stable under various con-
ditions. Investigating individual or combined noise channels, diverse parametrization,
or dynamic processes offers avenues for refining model architectures and optimizing
strategies further to enhance adaptability and scalability within the QML context.
The insights presented here may stimulate investigations into hybrid quantum-classical
designs and application-tailored model architectures, paving the way for more robust,
scalable, and interpretable quantum machine learning methods.
Declarations
Funding. Part of this work was performed while P.R. was funded by the National
Science Foundation under Grant Nos. 2136961, and 2210091, and 1905043. The views
expressed herein are solely those of the author(s) and do not necessarily reflect those
of the National Science Foundation.
Conflict of interest. The authors have no relevant financial or non-financial
interests to disclose.
Availability of data and materials. Data sharing does not apply to this article
as no datasets were generated or analyzed during the current study. This article is
22
Page 23:
a systematic literature review, and as such, it synthesizes and analyzes information
from previously published literature.
Appendix A
A.1 Parameter Space Geometry
In the main text, we introduced the Fisher information matrix (FIM) F(θ) as a Rie-
mannian metric on the parameter space Θ ⊂Rd. This induces a natural geometric
structure on Θ: the geodesic distance between θandθ′measures how different these
parameter points are in terms of their influence on the model’s predictions rather than
just their Euclidean distance.
For a parameterized model and its associated FIM, we consider the volume element
induced by F(θ):
dV(θ) =p
det(F(θ))dθ1···dθd. (A1)
A geodesic ball B(θ, ϵ) of radius ϵcentered at θhas volume
V(θ, ϵ) =Z
B(θ,ϵ)dV(θ′) =Z
B(θ,ϵ)p
det(F(θ′))dθ′. (A2)
Ifp
det(F(θ)) is bounded by a positive constant m > 0, then for small ϵ, we can
approximate:
V(θ, ϵ)≈Vdϵdp
det(F(θ))≥Vdϵdm, (A3)
where
Vd=πd/2
Γ d
2+ 1
is the volume of the unit ball in Rd. Intuitively, a lower bound onp
det(F(θ)) ensures
that geodesic balls are not ”too small,” implying fewer truly distinct parameter config-
urations at ϵ. This geometric insight underpins the covering number bound we discuss
next.
A.2 Covering Numbers of a Parameter Space
The covering number of a parameter space Θ measures how many balls of radius
ϵare required to cover the entire space. Since the FIM provides a natural measure
of distinguishability, a lower bound onp
det(F(θ)) relates volumes of small balls to
parameter distinguishability.
Lemma A.1 (Covering Number and Volume) .LetΘ⊂Rdbe a compact parameter
space with volume VΘ. Assume that the determinant of the Fisher Information Matrix
F(θ)satisfies:p
det(F(θ))≥m > 0,∀θ∈Θ.
Then, for any ϵ > 0, the covering number N(ϵ,Θ,|| · ||)ofΘwith respect to the
Euclidean norm || · || satisfies:
logN(ϵ,Θ,|| · ||)≤C−dlogϵ, (A4)
23
Page 24:
where C= log VΘ−logVd−logmandVd=πd
2
Γ(d
2+1)is the volume of a unit ball in Rd.
Proof. Consider an ϵ-ball around θ:
V(θ, ϵ)≥Vdϵdm.
To cover Θ, the total volume of Nsuch balls must be at least VΘ. So we have:
N(ϵ,Θ,|| · ||)≤VΘ
Vdϵdm. (A5)
Taking the logarithm:
logN(ϵ,Θ,| · |)≤logVΘ−log
Vdϵdm
= log VΘ−logVd−logm−dlogϵ
=C−dlogϵ.
This completes the proof of lemma (A.1).
The inequality (A4) shows how covering numbers scale with dimension d and
resolution ϵ. Smaller ϵor larger d requires more balls, thus more complex model classes.
Conversely, a larger m reduces complexity by ensuring a certain ”geometric rigidity”
in the parameter space.
Appendix B
B.1 Bounding Empirical Rademacher Complexity
We now relate covering numbers to the empirical Rademacher complexity ˆRN(FΘ)
of the model class FΘ={fθ,p:θ∈Θ}. The Rademacher complexity quanti-
fies the model’s capacity to fit random noise and thus provides an upper bound on
generalization error.
Lemma B.1 (Empirical Rademacher Complexity Bound) .LetFΘ={fθ,p:θ∈Θ}
be a model function class. Let D={xi, yi}N
i=1be a dataset of Nsamples drawn from
the distribution P. Assume that fθ,pis bounded by the Lipschitz constant Lp
fand
the determinant of the quantum Fisher information matrix is at least m. Then the
empirical Rademacher complexity of the model class FΘis bounded by:
ˆRN(FΘ)≤6√
πdexp
C′
d
√
N(B6)
where C′= log VΘ−logVd−logm+dlogLp
f.
24
Page 25:
Proof. Given a dataset D={xi, yi}N
i=1the empirical Rademacher complexity of a
model class FΘis defined as:
ˆRN(FΘ) =Eσ"
sup
f∈FΘNX
i=1σif(xi)#
, (B7)
where σ={σi}N
i=1are i.i.d Rademacher random variables. taking values in {−1,+1}
with equal probabilities. Dudley’s entropy integral provides an upper bound on the
empirical Rademacher complexity in terms of the covering numbers. If we let ∥ · ∥ 2,D
be the empirical L2-norm over the sample D, we have:
ˆRN(FΘ)≤12√
NZϵmax
0q
logN(ϵ,FΘ,|| · || 2,D)dϵ. (B8)
where N(ϵ,FΘ,||·||2,D) is the covering number of FΘwith respect empirical L2metric:
||f−g||2,D=
1
NNP
i=1(f(xi)−g(xi))1
2
andϵmax= sup
f∈FΘ||f||2,D. Since fθ,pis bounded
by the Lipschitz constant Lp
f, we can write: ||fθ,p−fθ′,p||2,D≤Lp
f||θ−θ′||,∀θ, θ′∈Θ.
This implies that an ϵ-cover of Θ with respect to the Euclidean norm || · || is also an
ϵLp
f-cover of FΘwith respect to the empirical L2metric. Using Lemma A.1, we can
write:
logN(ϵ,FΘ,|| · || 2,D)≤logN(ϵ
Lp
f,Θ,|| · ||)
≤C−dlog
ϵ
Lp
f!
=C−d
logϵ−logLp
f
=⇒logN(ϵ,FΘ,|| · || 2,D)≤C′−dlogϵ, (B9)
Substituting this result into into Eq. (B8) gives:
ˆRN(FΘ)≤12√
NZϵmax
0p
C′−dlogϵ dϵ. (B10)
We perform the elementary calculus in the following derivations, and if the reader
trusts our calculations, they may skip the following steps.
Lett=−logϵ, soϵ=e−tanddϵ=−e−tdt. The limit changes as: when ϵ=
ϵmax, t= log ϵmaxand as ϵdecreases from ϵmaxto 0, tincreases from −logϵmaxto 0.
Thus:
ˆRN(FΘ)≤12√
NZ0
−logϵmaxp
C′−dloge−t−e−tdt
25
Page 26:
Assuming ϵmax= 1 (since fθ,p(x) is bounded in [0,1]), we have t= 0 at ϵ= 1, so:
ˆRN(FΘ)≤12√
NZ∞
0√
C′+dt e−tdt
To make the derivation easier, let us define:
I=Z∞
0√
C′+dte−tdt (B11)
LetS=C′+dt, then t=S−C′
danddt
dS=d
dS
S−C′
d
=⇒dt=dS
dande−t=
e−
S−C′
d
=eC′
de−S
d. Since, C′anddare constants, we can write:
I=Z∞
C′√
SeC′
de−S
ddS
d=eC′
d
dZ∞
C′√
Se−S
ddS
Letu=s
d=⇒s=duandds=ddu. Thus,
I=eC′
d
dZ∞
C′
d√
due−ud(du) =eC′
d√
dZ∞
C′
du1
2e−udu
This integral is the gamma function and can be written as: I=eC′
d√
dΓ
3
2,C′
d
. Since
Γ(s, a)≤Γ(s) for a >0, and Γ 3
2
frac√π2 we can write:
I≤eC′
d√
d√π
2(B12)
Substituting this back into Eq. (B8), we get:
ˆRN(FΘ)≤6√
πd e
C′
d
√n(B13)
This completes the proof of the lemma (B.1).
This lemma encapsulates how geometry (via quantum FIM) and parameter space
volume control Rademacher complexity. As Ngrows, the complexity term diminishes,
indicating improved generalization potential.
Next, we discuss how the Rademacher complexity bound can be used to derive a
generalization bound for quantum machine learning models.
26
Page 27:
B.2 Generalization Bound
We now derive the generalization bound proposed in Theorem 5.1 in the main text.
The bound follows from standard statistical learning theory and the Rademacher
complexity result obtained above.
Theorem B.2 (Restatement of Theorem 5.1) .Letd, N∈N, δ∈(0,1), and consider
a parameter space Θ⊂Rd. The quantum model class FΘ={fθ, p :θ∈Θ}with noise
parameter p∈[0,1)satisfies fθ,p(x) =η(p)fθ(x)withη(0) = 1 . Assume:
•The loss l:Y ×R→[0,1]is Lipschitz continuous in its second argument with
constant L≤1.
•The model gradients are bounded as: ∥∇θfθ,p(x)∥ ≤Lp
f.
•The quantum FIM satisfiesp
det(F(θ))≥m > 0.
Define C′= log VΘ−logVd−logm+dlogLp
f.
Then with probability at least 1- δover an i.i.d. sample D={xi, yi}N
i=1of size N,
R(θ)≤ˆRN(θ) +12√
πdexp(C′/d)√
N+ 3r
log(2/δ)
2N, (B14)
uniformly for all θ∈Θ.
Proof. A standard result in learning theory states that with probability at least 1 −δ:
R(θ)≤ˆRN(θ) + 2 ˆRN(L) + 3r
log(2/δ)
2N, (B15)
where L={(x, y)7→l(y, fθ,p(x)) :θ∈Θ}.
Since l(y,ˆy) is Lipschitz with constant Landfθ,p(x) is bounded by Lp
f, it follows
thatLhas Rademacher complexity at most L·Lp
f. Applying Lemma B.1:
ˆRN(L)≤LLp
f6√
πdexp(C′/d)√
N.
ForL≤1, this simplifies directly to:
ˆRN(L)≤6√
πdexp (C′/d)√
N.
Substitute back into the generalization inequality:
R(θ)≤ˆRN(θ) + 2·6√
πdexp(C′/d)√
N+ 3r
log(2/δ)
2N.
27
Page 28:
This gives:
R(θ)≤ˆRN(θ) +12√
πdexp(C′/d)√
N+ 3r
log(2/δ)
2N,
matching the statement of Theorem 5.1.
This completes the proof of the generalization bound for theorem 5.1. The interplay
between Fisher information geometry, covering numbers, and Rademacher complexity
provides a path from parameter space properties to explicit generalization guarantees.
In practice, after training, restricting to local regions or reducing dimension to the
effective dimension often yields sharper bounds.
References
[1] Schuld, M., Petruccione, F.: Machine Learning with Quantum Computers, 2nd
edn. Quantum Science and Technology. Springer, Cham, Switzerland (2021).
https://doi.org/10.1007/978-3-030-83098-4
[2] Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.:
Quantum machine learning. Nature 549(7671), 195–202 (2017)
[3] Wang, Y., Liu, J.: A comprehensive review of quantum machine learning: from
nisq to fault tolerance. arXiv preprint arXiv:2401.11351 (2024)
[4] Preskill, J.: Quantum computing in the nisq era and beyond. Quantum 2, 79
(2018)
[5] Torlai, G., Melko, R.G.: Machine-learning quantum states in the nisq era. Annual
Review of Condensed Matter Physics 11(1), 325–344 (2020)
[6] Khanal, B., Rivas, P., Sanjel, A., Sooksatra, K., Quevedo, E., Rodriguez, A.:
Generalization error bound for quantum machine learning in nisq era—a survey.
Quantum Machine Intelligence 6(2), 1–20 (2024)
[7] Havl´ ıˇ cek, V., C´ orcoles, A.D., Temme, K., Harrow, A.W., Kandala, A., Chow,
J.M., Gambetta, J.M.: Supervised learning with quantum-enhanced feature
spaces. Nature 567(7747), 209–212 (2019)
[8] Cerezo, M., Arrasmith, A., Babbush, R., Benjamin, S.C., Endo, S., Fujii, K.,
McClean, J.R., Mitarai, K., Yuan, X., Cincio, L., et al. : Variational quantum
algorithms. Nature Reviews Physics 3(9), 625–644 (2021)
[9] Rivas, P., Orduz, J., Jui, T.D., DeCusatis, C., Khanal, B.: Quantum-enhanced
representation learning: A quanvolutional autoencoder approach against ddos
threats. Machine Learning and Knowledge Extraction 6(2), 944–964 (2024)
28
Page 29:
[10] Lloyd, S., Rebentrost, P., Mohseni, M.: Quantum algorithms for supervised and
unsupervised machine learning. arXiv preprint arXiv:1307.0411 (2013)
[11] A¨ ımeur, E., Brassard, G., Gambs, S.: Quantum clustering algorithms. In: Pro-
ceedings of the 24th International Conference on Machine Learning, pp. 1–8
(2007)
[12] Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learn-
ing, (2018). Second Edition. https://mitpress.ublish.com/ebook/foundations-of-
machine-learning–2-preview/7093/Cover
[13] Abu-Mostafa, Y.S., Magdon-Ismail, M., Lin, H.T.: Learning from data: A short
course (2012)
[14] Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: From theory
to algorithms (2014)
[15] Emami, M., Sahraee-Ardakan, M., Pandit, P., Rangan, S., Fletcher, A.: Gener-
alization error of generalized linear models in high dimensions. In: International
Conference on Machine Learning, pp. 2892–2901 (2020). PMLR
[16] Jakubovitz, D., Giryes, R., Rodrigues, M.R.: Generalization error in deep
learning. In: Compressed Sensing and Its Applications: Third International
MATHEON Conference 2017, pp. 153–193 (2019). Springer
[17] Nadeau, C., Bengio, Y.: Inference for the generalization error. Advances in neural
information processing systems 12(1999)
[18] Banchi, L., Pereira, J., Pirandola, S.: Generalization in quantum machine
learning: A quantum information standpoint. PRX Quantum 2(4), 040321 (2021)
[19] Gil-Fuster, E., Eisert, J., Bravo-Prieto, C.: Understanding quantum machine
learning also requires rethinking generalization. arXiv preprint arXiv:2306.13461
(2023)
[20] Caro, M.C., Gil-Fuster, E., Meyer, J.J., Eisert, J., Sweke, R.: Encoding-dependent
generalization bounds for parametrized quantum circuits. Quantum 5, 582 (2021)
[21] Khanal, B., Rivas, P.: Evaluating the impact of noise on variational quantum
circuits in nisq era devices. In: 2023 Congress in Computer Science, Computer
Engineering, & Applied Computing (CSCE) (2023)
[22] Caro, M.C., Huang, H.-Y., Cerezo, M., Sharma, K., Sornborger, A., Cincio, L.,
Coles, P.J.: Generalization in quantum machine learning from few training data.
Nature communications 13(1), 4919 (2022)
[23] Caro, M.C., Gur, T., Rouz´ e, C., Franca, D.S., Subramanian, S.: Information-
theoretic generalization bounds for learning from quantum data. In: The Thirty
29
Page 30:
Seventh Annual Conference on Learning Theory, pp. 775–839 (2024). PMLR
[24] Haug, T., Kim, M.: Generalization with quantum geometry for learning unitaries.
arXiv preprint arXiv:2303.13462 (2023)
[25] Canatar, A., Peters, E., Pehlevan, C., Wild, S.M., Shaydulin, R.: Bandwidth
enables generalization in quantum kernel models. arXiv preprint arXiv:2206.06686
(2022)
[26] Caro, M.C., Huang, H.-Y., Ezzell, N., Gibbs, J., Sornborger, A.T., Cincio, L.,
Coles, P.J., Holmes, Z.: Out-of-distribution generalization for learning quantum
dynamics. Nature Communications 14(1), 3751 (2023)
[27] Abbas, A., Sutter, D., Zoufal, C., Lucchi, A., Figalli, A., Woerner, S.: The power
of quantum neural networks. Nature Computational Science 1(6), 403–409 (2021)
[28] Bu, K., Koh, D.E., Li, L., Luo, Q., Zhang, Y.: Effects of quantum resources and
noise on the statistical complexity of quantum circuits. Quantum Science and
Technology 8(2), 025013 (2023)
[29] Bu, K., Koh, D.E., Li, L., Luo, Q., Zhang, Y.: Statistical complexity of quantum
circuits. Physical Review A 105(6), 062431 (2022)
[30] Martinis, J.M., Nam, S., Aumentado, J., Lang, K., Urbina, C.: Decoherence of a
superconducting qubit due to bias noise. Physical Review B 67(9), 094510 (2003)
[31] Wang, S., Fontana, E., Cerezo, M., Sharma, K., Sone, A., Cincio, L., Coles,
P.J.: Noise-induced barren plateaus in variational quantum algorithms. Nature
communications 12(1), 6961 (2021)
[32] Heyraud, V., Li, Z., Denis, Z., Le Boit´ e, A., Ciuti, C.: Noisy quantum kernel
machines. Physical Review A 106(5), 052421 (2022)
[33] Shor, P.W.: Scheme for reducing decoherence in quantum computer memory.
Physical review A 52(4), 2493 (1995)
[34] Khanal, B., Rivas, P.: Learning robust observable to address noise in quantum
machine learning. arXiv preprint arXiv:2409.07632 (2024)
[35] Shaib, A., Naim, M.H., Fouda, M.E., Kanj, R., Kurdahi, F.: Efficient noise
mitigation technique for quantum computing. Scientific Reports 13(1), 3912
(2023)
[36] Ferracin, S., Hashim, A., Ville, J.-L., Naik, R., Carignan-Dugas, A., Qassim, H.,
Morvan, A., Santiago, D.I., Siddiqi, I., Wallman, J.J.: Efficiently improving the
performance of noisy quantum computers. Quantum 8, 1410 (2024)
[37] Gentinetta, G., Thomsen, A., Sutter, D., Woerner, S.: The complexity of quantum
30
Page 31:
support vector machines. Quantum 8, 1225 (2024)
[38] Liu, Y., Arunachalam, S., Temme, K.: A rigorous and robust quantum speed-up
in supervised machine learning. Nature Physics 17(9), 1013–1017 (2021)
[39] K¨ ubler, J., Buchholz, S., Sch¨ olkopf, B.: The inductive bias of quantum kernels.
Advances in Neural Information Processing Systems 34, 12661–12673 (2021)
[40] Thanasilp, S., Wang, S., Cerezo, M., Holmes, Z.: Exponential concentration in
quantum kernel methods. Nature Communications 15(1), 5200 (2024)
[41] Huang, H.-Y., Broughton, M., Mohseni, M., Babbush, R., Boixo, S., Neven, H.,
McClean, J.R.: Power of data in quantum machine learning. Nature communica-
tions 12(1), 2631 (2021)
[42] Czarnik, P., Arrasmith, A., Coles, P.J., Cincio, L.: Error mitigation with clifford
quantum-circuit data. Quantum 5, 592 (2021)
[43] Cerezo, M., Verdon, G., Huang, H.-Y., Cincio, L., Coles, P.J.: Challenges and
opportunities in quantum machine learning. Nature Computational Science 2(9),
567–576 (2022)
[44] Holmes, Z., Sharma, K., Cerezo, M., Coles, P.J.: Connecting ansatz expressibility
to gradient magnitudes and barren plateaus. PRX Quantum 3(1), 010313 (2022)
[45] Zhao, C., Gao, X.-S.: Analyzing the barren plateau phenomenon in training
quantum neural networks with the zx-calculus. Quantum 5, 466 (2021)
[46] McClean, J.R., Boixo, S., Smelyanskiy, V.N., Babbush, R., Neven, H.: Barren
plateaus in quantum neural network training landscapes. Nature communications
9(1), 4812 (2018)
[47] Arrasmith, A., Cerezo, M., Czarnik, P., Cincio, L., Coles, P.J.: Effect of barren
plateaus on gradient-free optimization. Quantum 5, 558 (2021)
[48] Nielsen, M.A., Chuang, I.: Quantum computation and quantum information.
American Association of Physics Teachers (2002)
[49] Shalev-Shwartz, S., Shamir, O., Srebro, N., Sridharan, K.: Learnability, stability
and uniform convergence. The Journal of Machine Learning Research 11, 2635–
2670 (2010)
[50] Valle-P´ erez, G., Louis, A.A.: Generalization bounds for deep learning. arXiv
preprint arXiv:2012.04115 (2020)
[51] Johansson, F.D., Shalit, U., Kallus, N., Sontag, D.: Generalization bounds and
representation learning for estimation of potential outcomes and causal effects.
Journal of Machine Learning Research 23(166), 1–50 (2022)
31
Page 32:
[52] Pape, A.D., Kurtz, K.J., Sayama, H.: Complexity measures and concept learning.
Journal of Mathematical Psychology 64, 66–75 (2015)
[53] Bialek, W., Nemenman, I., Tishby, N.: Predictability, complexity, and learning.
Neural computation 13(11), 2409–2463 (2001)
[54] Schuld, M., Killoran, N.: Quantum machine learning in feature hilbert spaces.
Physical review letters 122(4), 040504 (2019)
[55] Mitarai, K., Negoro, M., Kitagawa, M., Fujii, K.: Quantum circuit learning.
Physical Review A 98(3), 032309 (2018)
[56] Schuld, M., Bergholm, V., Gogolin, C., Izaac, J., Killoran, N.: Evaluating analytic
gradients on quantum hardware. Physical Review A 99(3), 032331 (2019)
[57] Schuld, M., Sweke, R., Meyer, J.J.: Effect of data encoding on the expres-
sive power of variational quantum-machine-learning models. Physical Review A
103(3), 032430 (2021)
[58] Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum cir-
cuits as machine learning models. Quantum Science and Technology 4(4), 043001
(2019)
[59] Gyongyosi, L., Imre, S.: Training optimization for gate-model quantum neural
networks. Scientific Reports 9(1), 12679 (2019)
[60] Crooks, G.E.: Gradients of parameterized quantum gates using the parameter-
shift rule and gate decomposition. arXiv preprint arXiv:1905.13311 (2019)
[61] Holevo, A.S.: Quantum Systems, Channels, Information: a Mathematical Intro-
duction. Walter de Gruyter GmbH & Co KG, ??? (2019)
[62] Gut ¸˘ a, M., Kahn, J.: Local asymptotic normality for qubit states. Physical Review
A—Atomic, Molecular, and Optical Physics 73(5), 052108 (2006)
[63] Magesan, E., Gambetta, J.M., Emerson, J.: Scalable and robust randomized
benchmarking of quantum processes. Physical review letters 106(18), 180504
(2011)
[64] Preskill, J.: Lecture notes for physics 229: Quantum information and computation.
California institute of technology 16(1), 1–8 (1998)
[65] Khanal, B., Rivas, P.: A modified depolarization approach for efficient quantum
machine learning. Mathematics 12(9), 1385 (2024)
[66] Haug, T., Kim, M.: Generalization of quantum machine learning models using
quantum fisher information metric. Physical Review Letters 133(5), 050603
(2024)
32
Page 33:
[67] Gyongyosi, L., Imre, S.: Circuit depth reduction for gate-model quantum com-
puters. Scientific reports 10(1), 11229 (2020)
[68] Gyongyosi, L., Imre, S.: Scalable distributed gate-model quantum computers.
Scientific reports 11(1), 5172 (2021)
[69] Kasatkin, V., Mozgunov, E., Ezzell, N., Lidar, D.: Detecting quantum and classi-
cal phase transitions via unsupervised machine learning of the fisher information
metric. arXiv preprint arXiv:2408.03418 (2024)
[70] Bharti, K.: Fisher information: A crucial tool for nisq research. Quantum Views
5, 61 (2021)
[71] Meyer, J.J.: Fisher information in noisy intermediate-scale quantum applications.
Quantum 5, 539 (2021)
[72] Amari, S.-I.: Natural gradient works efficiently in learning. Neural computation
10(2), 251–276 (1998)
[73] Haug, T., Bharti, K., Kim, M.: Capacity and quantum geometry of parametrized
quantum circuits. PRX Quantum 2(4), 040309 (2021)
[74] Martens, J.: New insights and perspectives on the natural gradient method.
Journal of Machine Learning Research 21(146), 1–76 (2020)
[75] Stokes, J., Izaac, J., Killoran, N., Carleo, G.: Quantum natural gradient. Quantum
4, 269 (2020)
[76] Baumgratz, T., N¨ ußeler, A., Cramer, M., Plenio, M.B.: A scalable maximum
likelihood method for quantum state tomography. New Journal of Physics 15(12),
125004 (2013)
[77] Braunstein, S.L., Caves, C.M.: Statistical distance and the geometry of quantum
states. Physical Review Letters 72(22), 3439 (1994)
[78] Liu, J., Yuan, H., Lu, X.-M., Wang, X.: Quantum fisher information matrix and
multiparameter estimation. Journal of Physics A: Mathematical and Theoretical
53(2), 023001 (2020)
[79] Fujiwara, A.: Quantum channel identification problem. Physical Review A 63(4),
042304 (2001)
[80] Petz, D., Ghinea, C.: Introduction to quantum fisher information. In: Quantum
Probability and Related Topics, pp. 261–281. World Scientific, ??? (2011)
[81] Yamamoto, N.: On the natural gradient for variational quantum eigensolver.
arXiv preprint arXiv:1909.05074 (2019)
33
Page 34:
[82] Liu, J., Xiong, H.-N., Song, F., Wang, X.: Fidelity susceptibility and quan-
tum fisher information for density operators with arbitrary ranks. Physica A:
Statistical Mechanics and its Applications 410, 167–173 (2014)
[83] Abbas, A., Sutter, D., Figalli, A., Woerner, S.: Effective dimension of machine
learning models. arXiv preprint arXiv:2112.04807 (2021)
[84] Vapnik, V.: Statistical learning theory. John Wiley & Sons google schola 2, 831–
842 (1998)
[85] Helstrom, C.W.: Quantum detection and estimation theory. Journal of Statistical
Physics 1, 231–252 (1969)
[86] Paris, M.G.: Quantum estimation for quantum technology. International Journal
of Quantum Information 7(supp01), 125–137 (2009)
[87] Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: Risk
bounds and structural results. Journal of Machine Learning Research 3(Nov),
463–482 (2002)
[88] Larocca, M., Ju, N., Garc´ ıa-Mart´ ın, D., Coles, P.J., Cerezo, M.: Theory of over-
parametrization in quantum neural networks. Nature Computational Science
3(6), 542–551 (2023)
[89] Ciliberto, C., Herbster, M., Ialongo, A.D., Pontil, M., Rocchetto, A., Severini, S.,
Wossnig, L.: Quantum machine learning: a classical perspective. Proceedings of
the Royal Society A: Mathematical, Physical and Engineering Sciences 474(2209),
20170551 (2018)
34