Generating User Profiles

To facilitate item recommendations, representing each user's preferences as a numerical vector is crucial. A user profile, which summarizes the types of items a user enjoys, can be constructed as such a vector. By transforming user preferences into this vector format, it becomes possible to identify items with vectors that are closest to the user's profile, effectively matching items to their taste.

The most direct way to construct a user profile is by aggregating the feature vectors of the items they have interacted with positively. If a user has watched and liked several science fiction movies high in action, their user profile vector should reflect a preference for those specific attributes.

Aggregating Item Vectors

Let's examine a user who has rated several items. We can compute their profile by taking the average of the vectors of all the items they've rated.

For example, suppose a user has rated two movies, "Sci-Fi Adventure" and "Space Opera," represented by the following simplified item vectors:

Sci-Fi Adventure: [0.8, 0.1, 0.6, 0.0] (high on sci-fi, low on comedy, high on action) Space Opera: [0.9, 0.0, 0.4, 0.1] (high on sci-fi, no comedy, moderate action)

A simple user profile could be the element-wise average of these two vectors: User Profile: [0.85, 0.05, 0.5, 0.05]

This resulting vector represents a user who strongly prefers sci-fi and action, with very little interest in comedy, which is an accurate synthesis of their viewing history.

Using Ratings as Weights

A simple average treats every interaction equally. However, in many systems, we have access to explicit feedback, such as 5-star ratings. A movie a user rated as a 5 should have more influence on their profile than a movie they rated as a 3. We can achieve this by calculating a weighted average, where the weights are the ratings the user has given.

The formula for a user $u$ 's profile vector, $\vec{p_u}$ , is:

$\vec{p_u} = \frac{\sum_{i \in I_u} r_{ui} \cdot \vec{v_i}}{\sum_{i \in I_u} |r_{ui}|}$

Where:

$I_u$ is the set of items rated by user $u$ .
$r_{ui}$ is the rating user $u$ gave to item $i$ .
$\vec{v_i}$ is the feature vector for item $i$ .

This calculation ensures that items with higher ratings contribute more significantly to the final user profile vector, providing a more accurate representation of their preferences.

The feature vectors of items a user has rated positively (Movie A and Movie C) are combined using their ratings as weights to create a single user profile vector. Movie B, which the user has not rated, does not contribute.

A Practical Implementation

Let's see how this looks in practice using pandas and NumPy. Assume you have a DataFrame ratings with columns ['userId', 'movieId', 'rating'] and another DataFrame or matrix item_vectors where the index corresponds to movieId.

First, let's isolate the ratings for a specific user, for instance, user with userId=1:

# Assume item_vectors is a DataFrame with movieIds as index and features as columns
# and ratings is the user ratings DataFrame.
user_id = 1
user_ratings = ratings[ratings['userId'] == user_id]

Next, we select only the items this user has rated from our item_vectors matrix and drop any items for which we might not have feature vectors.

# Get the vectors for the movies the user has rated
user_item_vectors = item_vectors.loc[user_ratings['movieId']]

# Align ratings to be used as weights
weights = user_ratings['rating'].values

Now, we can use NumPy's average function, which conveniently accepts a weights parameter, to compute the weighted average across the columns of the item vectors.

import numpy as np

# Calculate the weighted average of item vectors
user_profile_vector = np.average(user_item_vectors, axis=0, weights=weights)

print(user_profile_vector)
# Output: array([0.58, 0.12, 0.81, ...])

The resulting user_profile_vector is a single vector that encapsulates this user's preferences. It now resides in the exact same multi-dimensional space as our item vectors. This is a significant step because it allows us to use the cosine similarity metric we discussed earlier to compare the user's profile vector directly against all item vectors. By finding the items most similar to this profile, we can generate a ranked list of recommendations, which is exactly what we will do in the next section.

Was this section helpful?

References

Recommender Systems: An Introduction, Francesco Ricci, Lior Rokach, Bracha Shapira, 2015 (Springer) DOI: 10.1007/978-1-4899-7637-6_1 - This second edition is a standard reference for recommendation systems, covering content-based filtering methods and the construction of user profiles from item features.
Learning and Revising User Profiles for Content-Based Filtering, Michael J. Pazzani, Daniel Billsus, 1997 Proceedings of the Fourteenth International Conference on Machine Learning, 1997 (Association for Computing Machinery) DOI: 10.5555/645524.657512 - A foundational paper discussing how to learn and update user profiles for content-based filtering, including aggregation techniques.
Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014 (Cambridge University Press) - Chapter 9 of this book provides an overview of recommender systems, including content-based methods and the construction of user profiles using item feature vectors.