Recommendation engines are not a single type of technology. Instead, they are a collection of algorithms that can be categorized into distinct families based on the data they use and the methods they apply. Understanding this taxonomy is essential for selecting the right approach for a given problem. The three primary categories of recommender engines are Content-Based Filtering, Collaborative Filtering, and Hybrid Systems, which combine aspects of the first two.
A high-level classification of recommendation system algorithms.
Content-based filtering operates on a straightforward principle: if you like an item, you will probably like other items that are similar to it. This approach focuses on the properties, or "content," of the items themselves. For example, if you watch a science fiction movie directed by James Cameron, a content-based system would recommend other science fiction movies, or perhaps other movies directed by James Cameron.
To make this work, the system must first understand the items. This involves creating a profile for each item that details its features. For a movie, these features could include genre, director, actors, and plot keywords. For an article, they might be its topic, author, and the words used in its text. A user's profile is then built based on the features of the items they have previously rated or shown interest in. Recommendations are generated by matching the user's profile against the profiles of other items.
Strengths:
Weaknesses:
Collaborative filtering takes a different approach. It works by collecting preferences or behavior information from many users (collaborating). The underlying assumption is that if two users agreed on certain items in the past, they are likely to agree on other items in the future. The system does not need to know anything about the items themselves, only how users have interacted with them.
For instance, if User A and User B have both given high ratings to The Matrix and Blade Runner, and User A also liked Inception, the system might recommend Inception to User B. This approach uses the "wisdom of the crowd" to find new items.
Collaborative filtering algorithms are typically divided into two sub-categories.
Also known as memory-based methods, these algorithms work directly with the user-item interaction data. They find "neighborhoods" of similar users or items to make predictions.
These methods use machine learning techniques to find patterns in the user-item interaction data. The goal is to build a model that can predict a user's rating for an item they have not yet seen. A prominent technique in this category is matrix factorization, which decomposes the large user-item interaction matrix into smaller, lower-dimensional matrices representing latent factors for users and items. These latent factors might represent abstract properties like a movie's genre or a user's preference for a certain style of film, but they are learned automatically from the data.
Strengths:
Weaknesses:
As you might expect, hybrid systems are built by combining content-based and collaborative filtering methods. The goal is to leverage the strengths of each approach while mitigating their respective weaknesses. For example, a hybrid system could use a content-based model to handle new items (addressing the cold-start problem) and a collaborative model for users and items with sufficient data.
There are many ways to combine these techniques. Some common strategies include:
Throughout this course, we will build systems using each of these major approaches. We will start with content-based filtering in the next chapter, move on to neighborhood-based and model-based collaborative filtering, and finally construct a hybrid system that brings these methods together.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with