In the Variational Quantum Algorithm (VQA) framework, the parameterized quantum circuit (PQC) prepares a quantum state ∣ψ(θ)⟩, where θ represents the classical parameters we aim to optimize. However, to guide this optimization, we need a way to quantify how well the state ∣ψ(θ)⟩ achieves the target machine learning objective. This is the role of the cost function, C(θ). It acts as the essential bridge between the quantum processing unit (QPU) and the classical optimization routine, translating quantum measurement outcomes into a scalar value that indicates performance.
At the heart of a VQA's evaluation step lies quantum measurement. Typically, we measure one or more qubits in a specific basis (often the computational basis, Z-basis) after the state ∣ψ(θ)⟩ has been prepared. This measurement process is inherently probabilistic, yielding classical bitstrings as outcomes.
To form a differentiable cost function suitable for optimization, we usually don't work directly with the raw probabilities of these bitstrings. Instead, we compute the expectation value of a chosen observable, which is a Hermitian operator O^. For a given state ∣ψ(θ)⟩ prepared by the PQC with parameters θ, the expectation value is calculated as:
⟨O^⟩θ=⟨ψ(θ)∣O^∣ψ(θ)⟩This expectation value ⟨O^⟩θ provides a smooth, real-valued output that depends on the circuit parameters θ. It represents the average value we would obtain if we measured the observable O^ on the state ∣ψ(θ)⟩ many times. This expectation value, or a function derived from it and the target data labels, forms the basis of our cost function C(θ).
The overall process can be visualized as follows:
The VQA loop involves preparing a state with the PQC using current parameters θ, measuring an observable O^ to estimate its expectation value ⟨O^⟩θ, calculating the cost C(θ) by comparing this value to the target, and feeding the cost back to a classical optimizer to propose updated parameters.
The exact form of C(θ) depends significantly on the machine learning task at hand. Let's look at common scenarios:
In classification tasks, the goal is to assign input data xi to one of several discrete categories.
Binary Classification: For problems with two classes (e.g., labeled +1 and -1), a common approach is to design the PQC and observable O^ such that the expectation value ⟨O^⟩θ(i) (computed for input xi) correlates with the class label yi. For instance, we might measure the Pauli Z operator Z^0 on the first qubit, aiming for ⟨Z^0⟩θ(i)≈+1 for one class and ⟨Z^0⟩θ(i)≈−1 for the other.
Multi-class Classification: Extending these ideas to more than two classes typically involves using multiple output qubits, different measurement strategies (e.g., measuring multiple Pauli operators), or combining binary classifiers. The cost function needs to be adapted accordingly, often using multi-class versions of MSE or cross-entropy.
In regression, the goal is to predict a continuous value yi for an input xi.
Cost functions for generative tasks, such as learning probability distributions with Quantum Circuit Born Machines (QCBMs) or training Quantum Generative Adversarial Networks (QGANs), are distinct. They often involve metrics that compare the probability distribution produced by the quantum circuit to the target data distribution (e.g., Maximum Mean Discrepancy, Kullback-Leibler divergence estimates) or adversarial losses derived from a discriminator network. These will be explored in more detail in Chapter 6.
The choice of the observable O^ is not arbitrary; it's an integral part of the VQA design. It determines precisely what property of the final quantum state ∣ψ(θ)⟩ is extracted to make the prediction. Common choices include:
The observable should be chosen based on how the PQC encodes information. If the final answer is expected to be encoded in the polarization of a specific qubit, measuring a Pauli operator on that qubit makes sense. If correlations are important, a multi-qubit observable might be necessary. The range of the expectation value (e.g., [−1,+1] for Pauli operators) should also be considered when relating it to target labels or values.
It's important to remember that on any real quantum computer (or simulator mimicking one), we cannot access the exact expectation value ⟨O^⟩θ. Instead, we estimate it by preparing the state ∣ψ(θ)⟩ and measuring the observable O^ repeatedly, say Nshots times.
The accuracy of this estimation depends on the number of shots, with the standard deviation of the estimate typically scaling as 1/Nshots. This inherent statistical noise, known as "shot noise," means the cost function evaluation itself is noisy, adding a layer of challenge to the optimization process compared to classical ML where function evaluations are typically deterministic.
The cost function C(θ) defines the optimization surface that the classical optimizer explores to find the best parameters θ. The structure of this surface, influenced by the PQC architecture, the data encoding, the choice of observable, and the specific cost function formula, determines the feasibility and efficiency of training the VQA. Issues like the presence of many local minima or the phenomenon of barren plateaus (regions where gradients vanish exponentially with the number of qubits) are directly related to the properties of C(θ) and its gradients. Understanding how to define effective cost functions is therefore fundamental to building successful VQAs for machine learning. The subsequent sections on gradient calculation and optimization techniques will build directly upon this foundation.
© 2025 ApX Machine Learning