Taxonomy of Recommender Engines

Recommendation engines are not a single type of technology. Instead, they are a collection of algorithms that can be categorized into distinct families based on the data they use and the methods they apply. Understanding this taxonomy is essential for selecting the right approach for a given problem. The three primary categories of recommender engines are Content-Based Filtering, Collaborative Filtering, and Hybrid Systems, which combine aspects of the first two.

A high-level classification of recommendation system algorithms.

Content-Based Filtering

Content-based filtering operates on a straightforward principle: if you like an item, you will probably like other items that are similar to it. This approach focuses on the properties, or "content," of the items themselves. For example, if you watch a science fiction movie directed by James Cameron, a content-based system would recommend other science fiction movies, or perhaps other movies directed by James Cameron.

To make this work, the system must first understand the items. This involves creating a profile for each item that details its features. For a movie, these features could include genre, director, actors, and plot keywords. For an article, they might be its topic, author, and the words used in its text. A user's profile is then built based on the features of the items they have previously rated or shown interest in. Recommendations are generated by matching the user's profile against the profiles of other items.

Strengths:
- User Independence: Recommendations for one user do not depend on the actions of other users. This makes the system easier to scale to a large number of users.
- Solves the New Item Problem: A new movie can be recommended as soon as its features are available, even if no one has rated it yet. This is a significant advantage over other methods.
- Transparency: The recommendations are easy to explain. For instance, "We are recommending this movie because you liked other movies in the science fiction genre."
Weaknesses:
- Limited Serendipity: The system tends to recommend items similar to what a user has already seen, making it difficult to discover new interests. This is often called the "filter bubble" problem.
- Requires Feature Engineering: The effectiveness of a content-based system depends heavily on the quality of the item features. Extracting and selecting these features can be a difficult and time-consuming process.

Collaborative Filtering

Collaborative filtering takes a different approach. It works by collecting preferences or behavior information from many users (collaborating). The underlying assumption is that if two users agreed on certain items in the past, they are likely to agree on other items in the future. The system does not need to know anything about the items themselves, only how users have interacted with them.

For instance, if User A and User B have both given high ratings to The Matrix and Blade Runner, and User A also liked Inception, the system might recommend Inception to User B. This approach uses the "wisdom of the crowd" to find new items.

Collaborative filtering algorithms are typically divided into two sub-categories.

Neighborhood-Based Methods

Also known as memory-based methods, these algorithms work directly with the user-item interaction data. They find "neighborhoods" of similar users or items to make predictions.

User-Based Collaborative Filtering: This method finds users who are similar to the target user based on their rating history. It then recommends items that these similar users liked but the target user has not yet seen.
Item-Based Collaborative Filtering: Instead of finding similar users, this method finds items that are similar based on how they were rated by the same users. If a user liked a particular item, the system recommends other items that are similar to it. This approach is often preferred for its stability and scalability.

Model-Based Methods

These methods use machine learning techniques to find patterns in the user-item interaction data. The goal is to build a model that can predict a user's rating for an item they have not yet seen. A prominent technique in this category is matrix factorization, which decomposes the large user-item interaction matrix into smaller, lower-dimensional matrices representing latent factors for users and items. These latent factors might represent abstract properties like a movie's genre or a user's preference for a certain style of film, but they are learned automatically from the data.

Strengths:
- Finds Unexpected Items: Because it relies on user behavior rather than item features, collaborative filtering can recommend items that are serendipitous and expand a user's tastes.
- No Domain Knowledge Required: The system does not need any information about the items themselves, just the interaction data.
Weaknesses:
- The Cold-Start Problem: The system cannot make recommendations for new users or new items because there is no interaction data available for them.
- Data Sparsity: In most scenarios, the user-item interaction matrix is very sparse, meaning most users have only rated a few items. This can make it difficult to find users or items with enough overlapping ratings to make reliable predictions.

Hybrid Systems

As you might expect, hybrid systems are built by combining content-based and collaborative filtering methods. The goal is to leverage the strengths of each approach while mitigating their respective weaknesses. For example, a hybrid system could use a content-based model to handle new items (addressing the cold-start problem) and a collaborative model for users and items with sufficient data.

There are many ways to combine these techniques. Some common strategies include:

Weighted Hybrids: The prediction scores from different models are combined using a weighted average.
Switching Hybrids: The system switches between different recommender models based on certain criteria, such as the amount of data available for a user or item.
Feature Combination: Content-based features are incorporated directly into a collaborative filtering model, for instance, by using them to augment the user or item profiles.

Throughout this course, we will build systems using each of these major approaches. We will start with content-based filtering in the next chapter, move on to neighborhood-based and model-based collaborative filtering, and finally construct a hybrid system that brings these methods together.

Was this section helpful?

References

Recommender Systems Handbook, Francesco Ricci, Lior Rokach, Bracha Shapira, 2015 (Springer) DOI: 10.1007/978-1-4939-0713-0 - A comprehensive handbook covering all major types of recommender systems and their algorithms.
Item-based collaborative filtering recommendation algorithms, Badrul Munir Sarwar, George Karypis, Joseph A. Konstan, John Riedl, 2001 Proceedings of the 10th international conference on World Wide Web (ACM) DOI: 10.1145/371920.372071 - A seminal paper introducing and evaluating item-based collaborative filtering, a widely used neighborhood-based method.
Matrix factorization techniques for recommender systems, Yehuda Koren, Robert M. Bell, Chris Volinsky, 2009 Computer, Vol. 42 (IEEE) DOI: 10.1109/MC.2009.263 - A definitive work on matrix factorization, a powerful model-based collaborative filtering technique, including its application in large-scale systems.
Recommender systems: A survey, Yingda Wang, Yan Zhang, 2020 Journal of Physics: Conference Series, Vol. 1586 DOI: 10.1088/1742-6596/1586/1/012028 - A recent survey providing an overview of various recommender system types, including content-based, collaborative, and hybrid approaches.