Once you have defined your Parameterized Quantum Circuit (PQC) and a suitable cost function C(θ) representing your machine learning objective, the core task in training a Variational Quantum Algorithm (VQA) becomes finding the optimal parameters θ∗ that minimize this cost function:
θ∗=argθminC(θ)This optimization problem is typically handled by a classical computer that interacts iteratively with the quantum processor (or simulator). The classical optimizer proposes new parameter values θ, the quantum device estimates C(θ) (and potentially its gradient ∇θC(θ)), and this information is fed back to the optimizer to suggest the next set of parameters.
While standard gradient descent can be used, the specific nature of VQAs often benefits from more sophisticated classical optimization techniques. The cost function landscape can be complex (non-convex), and estimating the cost function and its gradients via quantum measurements introduces inherent statistical noise (shot noise). Selecting an appropriate optimizer is therefore significant for efficient and successful training. This section examines several advanced classical optimizers commonly employed for VQAs.
The hybrid quantum-classical loop of a VQA. The classical optimizer uses results from quantum measurements to iteratively update the parameters of the PQC.
These optimizers rely on access to the gradient ∇θC(θ) of the cost function with respect to the circuit parameters. As discussed in the previous section ("Gradient Calculation Methods"), techniques like the parameter-shift rule allow for analytical gradient calculation on quantum hardware, albeit at the cost of additional circuit evaluations.
Plain SGD updates parameters based on the gradient estimate from a small batch of data (or even a single data point in the context of minimizing expectation values from quantum circuits). While simple, it can suffer from slow convergence and oscillations. More advanced variants incorporate momentum or adaptive learning rates.
Adam (Adaptive Moment Estimation): Adam is a widely used optimizer in classical deep learning and often serves as a strong baseline for VQAs. It computes adaptive learning rates for each parameter by storing an exponentially decaying average of past squared gradients (like RMSProp) and an exponentially decaying average of past gradients (like momentum).
The update rule involves calculating biased first moment (mt) and second moment (vt) estimates:
mt=β1mt−1+(1−β1)gtvt=β2vt−1+(1−β2)gt2where gt is the gradient at step t, and β1,β2 are hyperparameters (typically close to 1, e.g., 0.9 and 0.999). Bias-corrected estimates m^t and v^t are computed, and the parameters are updated:
θt+1=θt−ηv^t+ϵm^tHere, η is the learning rate and ϵ is a small constant for numerical stability.
Methods like BFGS (Broyden–Fletcher–Goldfarb–Shanno) belong to the family of quasi-Newton methods. They aim to approximate the inverse Hessian matrix (matrix of second derivatives) using only gradient information gathered over successive iterations. This allows them to estimate the curvature of the loss landscape and take more informed steps, potentially leading to faster convergence, especially near a local minimum.
The update direction is calculated as pk=−Bkgk, where Bk is the approximation of the inverse Hessian at iteration k. A line search is often performed along this direction to find an appropriate step size.
These optimizers do not require explicit calculation of the gradient ∇θC(θ). They typically work by evaluating the cost function C(θ) at different points in the parameter space and using this information to guide the search. They can be advantageous when gradients are difficult or expensive to compute, or when the cost function is non-differentiable or extremely noisy.
SPSA is particularly well-suited for optimizing functions where evaluations are noisy, a common scenario in VQAs due to finite measurement statistics. Its defining feature is that it estimates the gradient direction using only two cost function evaluations, regardless of the number of parameters N.
At each iteration k, SPSA proceeds as follows:
The sequences ak (step size) and ck (perturbation size) are predefined decreasing sequences that must satisfy certain conditions for convergence.
Other methods like COBYLA (Constrained Optimization BY Linear Approximation) and Nelder-Mead are sometimes used. COBYLA builds linear approximations of the objective and constraints, while Nelder-Mead uses a simplex (a geometric figure) that adapts to the local landscape. These can be useful but might struggle with higher numbers of parameters or noisy evaluations compared to SPSA.
The best choice of optimizer depends heavily on the specific VQA application, the characteristics of the PQC, the number of parameters, the available quantum resources (simulator vs. hardware, noise levels), and the cost of function/gradient evaluations.
Illustrative comparison of convergence behavior for different optimizers on a hypothetical VQA cost function. Adam might show rapid initial progress, while SPSA demonstrates steady, noise-resistant convergence. Plain SGD often converges much slower. Actual performance varies greatly depending on the problem.
Experimentation is often necessary. Standard quantum computing libraries like PennyLane and Qiskit provide implementations of many of these optimizers, making it relatively straightforward to switch between them and compare their performance on your specific VQA task. Remember that these classical optimizers operate purely classically; they simply use the cost function values (and possibly gradients) obtained from the quantum system as input for their classical optimization routines. The next section introduces the Quantum Natural Gradient, an optimization technique that explicitly considers the geometric structure of the quantum state space, offering a quantum-aware alternative.
© 2025 ApX Machine Learning