Training Variational Quantum Algorithms (VQAs) relies heavily on classical optimization loops. As we discussed, a VQA involves a Parameterized Quantum Circuit (PQC) U(θ) and a cost function C(θ), often defined as the expectation value of an observable H measured on the output state prepared by the PQC:
C(θ)=⟨ψ(θ)∣H∣ψ(θ)⟩=⟨0∣U†(θ)HU(θ)∣0⟩To minimize this cost function using gradient-based methods (like gradient descent and its variants), we need to compute the gradient vector ∇θC(θ), whose components are the partial derivatives ∂θj∂C with respect to each parameter θj in the circuit. Since the cost function evaluation involves running a quantum circuit and performing measurements, calculating these gradients requires specific techniques tailored to this hybrid quantum-classical setup. Let's examine the primary methods used in practice.
The most straightforward approach borrows directly from classical numerical methods: finite differences. To estimate the partial derivative with respect to a parameter θj, we can evaluate the cost function at slightly perturbed parameter values. The central difference formula is commonly used:
∂θj∂C≈2ϵC(θ+ϵej)−C(θ−ϵej)Here, ej is a unit vector with a 1 in the j-th position and zeros elsewhere, and ϵ is a small step size. This method requires evaluating the quantum circuit (and estimating the expectation value) twice for each parameter θj to compute its partial derivative.
While simple to understand and implement, finite differences have significant drawbacks in the context of quantum computation:
Due to these limitations, while finite differences can be useful for quick checks or simple problems, more specialized methods are generally preferred for training VQAs.
A more robust and widely adopted method for calculating gradients of PQCs is the parameter-shift rule. For a specific class of parameterized gates, this rule provides an analytical expression for the gradient, avoiding the numerical instability associated with choosing ϵ in finite differences.
Consider a gate in the PQC of the form Gj(θj)=e−i2θjPj, where Pj is an operator satisfying Pj2=I. This includes common single-qubit rotation gates like RX(θj)=e−i2θjX, RY(θj)=e−i2θjY, and RZ(θj)=e−i2θjZ, since X2=Y2=Z2=I.
If the overall PQC U(θ) can be written as U(θ)=VGj(θj)W, where V and W are other quantum circuits independent of θj, then the derivative of the expectation value C(θ)=⟨0∣U†(θ)HU(θ)∣0⟩ with respect to θj can be computed exactly using the following formula:
∂θj∂C(θ)=21[C(θ+2πej)−C(θ−2πej)]This remarkable result states that the exact derivative is proportional to the difference between the cost function evaluated with the parameter θj shifted forward and backward by a specific amount, s=π/2.
Why does this work? Let's sketch the idea. The derivative involves differentiating the gate Gj(θj): ∂θj∂Gj(θj)=−2iPje−i2θjPj=−2iPjGj(θj) Inserting this into the derivative expression for C(θ) leads to terms involving Pj. The crucial insight is that for gates where Pj2=I, the operator Pj can be related to shifted versions of the original gate Gj(θj). Specifically, one can show that: PjGj(θj)=2i[Gj(θj+2π)−Gj(θj−2π)] Substituting this back into the derivative expression for C(θ) and simplifying eventually yields the parameter-shift rule formula.
Visual representation of the parameter-shift rule for calculating the gradient component ∂C/∂θj. The expectation value ⟨H⟩ is estimated by running the circuit with the parameter θj shifted by +π/2 and −π/2. The results are combined to yield the exact gradient.
Advantages:
Cost: Like central finite differences, the parameter-shift rule requires two circuit evaluations for each parameter θj.
Applicability: The basic rule applies directly to gates generated by Pauli operators (X,Y,Z). Most common PQC ansätze used in QML are constructed primarily from such gates, making the parameter-shift rule widely applicable.
It's important to remember that in practice, whether using finite differences or parameter-shift rules, the cost function values C(θ±sej) are themselves estimated from a finite number of measurements (shots) on the quantum computer. Therefore, the calculated gradient ∇θC(θ) is also an estimate. The accuracy of this gradient estimate depends directly on the number of shots used for each expectation value calculation. This statistical uncertainty is inherent in VQAs and is a primary consideration when choosing optimization algorithms and interpreting training dynamics, which we will discuss next.
These gradient calculation methods form a vital link between the quantum circuit evaluation and the classical optimization routine, enabling the training of VQAs for machine learning tasks. The parameter-shift rule, in particular, represents a standard and effective technique implemented in most quantum software frameworks like Qiskit, PennyLane, and TensorFlow Quantum.
© 2025 ApX Machine Learning