Having established how quantum kernels k(x,x′) arise from inner products in quantum feature spaces, let's examine their essential mathematical properties. Just like their classical counterparts, understanding these characteristics is fundamental to effectively using quantum kernels in machine learning algorithms like Support Vector Machines (SVMs). These properties dictate the behavior of the kernel function and influence the geometry of the induced feature space, which in turn affects model performance and trainability.
Relationship between classical data, quantum feature states, their inner product, and the resulting kernel value.
A core requirement for any function to be a valid kernel in the context of SVMs and other kernel methods is that it must be positive semi-definite (PSD). This property ensures that the matrix formed by evaluating the kernel on any set of data points, known as the Gram matrix K, has non-negative eigenvalues. Formally, for any dataset {x1,...,xN} and any complex coefficients {c1,...,cN}, the following condition must hold:
i=1∑Nj=1∑NcˉiKijcj≥0where Kij=k(xi,xj). This condition is intrinsically linked to the existence of an underlying feature map ϕ such that k(x,x′)=⟨ϕ(x)∣ϕ(x′)⟩.
Let's verify this for quantum kernels. If we define the kernel directly as the inner product: k(x,x′)=⟨ϕ(x)∣ϕ(x′)⟩ where ∣ϕ(x)⟩ is the quantum state corresponding to data point x. Then, the sum becomes:
i,j∑cˉi⟨ϕ(xi)∣ϕ(xj)⟩cj=⟨i∑ciϕ(xi)j∑cjϕ(xj)⟩=i∑ci∣ϕ(xi)⟩2Since the norm squared of any vector in Hilbert space is non-negative, the condition is satisfied. Thus, kernels defined directly as inner products in Hilbert space are always PSD.
Often in quantum machine learning, the kernel is defined based on the measurement probability or fidelity between the feature states, commonly taking the form: k(x,x′)=∣⟨ϕ(x)∣ϕ(x′)⟩∣2
Is this function still PSD? Yes, it is. Consider the Gram matrix G where Gij=⟨ϕ(xi)∣ϕ(xj)⟩. We know G is PSD. The kernel matrix K we are interested in has elements Kij=∣⟨ϕ(xi)∣ϕ(xj)⟩∣2=GijGij. This can be seen as the element-wise product (Hadamard product) of G and its complex conjugate Gˉ. Since G is PSD, Gˉ is also PSD. The Schur Product Theorem states that the Hadamard product of two PSD matrices is also PSD. Therefore, quantum kernels defined as k(x,x′)=∣⟨ϕ(x)∣ϕ(x′)⟩∣2 are guaranteed to be valid, PSD kernels.
This PSD property is significant because it guarantees that the optimization problem underlying SVMs remains convex, ensuring that we can find a unique global minimum.
Quantum kernels, particularly those defined as k(x,x′)=∣⟨ϕ(x)∣ϕ(x′)⟩∣2, often exhibit natural normalization properties stemming from the nature of quantum states. Assuming the quantum feature states ∣ϕ(x)⟩ are normalized states (i.e., ⟨ϕ(x)∣ϕ(x)⟩=1), we have:
So, this common form of the quantum kernel naturally produces values between 0 and 1, where 1 indicates identical feature states (up to a global phase) and values closer to 0 indicate more dissimilar (orthogonal) states. This boundedness can be helpful in practice, preventing numerical issues and providing an intuitive scale for similarity. However, normalization of the input states ∣ϕ(x)⟩ is a prerequisite, which is standard practice in quantum computation.
In classical machine learning, a kernel is called universal if the corresponding Reproducing Kernel Hilbert Space (RKHS) is dense in the space of continuous functions on the input domain. Practically, this means an SVM with a universal kernel can approximate any arbitrary decision boundary, given enough data. The Gaussian (RBF) kernel is a well-known example of a universal classical kernel.
Can quantum kernels be universal? The answer depends entirely on the expressivity of the quantum feature map ϕ.
Research suggests that certain quantum feature maps, especially those involving sufficient entanglement and data re-uploading structures (as discussed in Chapter 2), can indeed lead to universal kernels. Proving universality often involves analyzing the richness of the functions x↦∣ϕ(x)⟩ and relating it to Fourier analysis or function approximation theory. The ability to generate high-frequency components in the kernel function is often linked to universality. Achieving universality might require feature maps with a number of parameters growing with the size of the dataset or the complexity of the target function.
It cannot be stressed enough: The properties and performance of a quantum kernel are entirely determined by the choice of the quantum feature map ϕ. Different circuits used for encoding x into ∣ϕ(x)⟩ will lead to different kernels. Consider these factors influenced by the feature map design:
Therefore, designing or choosing an appropriate feature map is a central task in quantum kernel methods. A feature map that works well for one dataset might perform poorly on another. This connects back strongly to the topics covered in Chapter 2 on encoding strategies and their expressibility.
While mathematically elegant, the practical estimation of quantum kernels k(x,x′)=∣⟨ϕ(x)∣ϕ(x′)⟩∣2 on quantum hardware introduces complications:
These practical aspects related to hardware noise and estimation will be discussed in greater detail in Chapter 7.
In summary, quantum kernels inherit the essential PSD property from their definition via inner products in Hilbert space. Their normalization depends on the normalization of feature states, and their potential universality hinges on the expressivity of the chosen quantum feature map. The tight coupling between the feature map design and the kernel's properties makes feature map engineering a critical aspect of applying these methods successfully.
© 2025 ApX Machine Learning