Recommendation systems are all around us - they have become the default method for discovering and exploring new content. Whether it's listening to songs on Spotify, watching movies on Netflix, reading blogs on Medium, or shopping on Amazon, recommendation systems are everywhere. These systems are designed to help users make decisions and discover new content.
By analyzing user behavior and patterns through browsing history, user interactions, ratings, social media interactions, demographics, and more, companies can understand users' likes and dislikes. Today, sophisticated systems are built using advanced algorithms to predict what a user might enjoy.
In this article, we are working through an example of collaborative filtering recommendations. For a broader understanding of recommendation systems, be sure to check out our blog Recommendation Systems 101.
Role of Vector Databases in Recommendation Systems
At the heart of recommendation systems is the task of identifying similarities between items and users. This requires processing high-dimensional data that encapsulates latent item features and user preferences. Vector databases, like emno, are crucial in optimizing this process.
- Managing High-Dimensional Data: Vector databases specialize in storing and managing high-dimensional data, such as user and item embeddings. These embeddings represent detailed user preferences and item characteristics, forming the backbone of effective recommendation systems.
- Rapid Retrieval of Similar Items: The advanced indexing techniques used by vector databases enable quick retrieval of similar users or items by efficiently comparing their vector representations. This capability is crucial for recommendation systems handling large-scale datasets, allowing them to identify relevant connections in real time.
- Precision in Recommendations: Vector databases improve the accuracy of recommendations. By efficiently comparing vectors, they ensure that the suggested items or content closely align with the user's interests, leading to more personalized and relevant recommendations.
As we dive into the specifics of implementing collaborative filtering, we'll discover how vector databases assist in the recommendation process, ensuring users receive accurate and personalized suggestions.
Collaborative filtering
Collaborative filtering models primarily rely on user-item interaction data (such as movie ratings). They identify patterns in these interactions and use them to predict how a user might rate items they haven't interacted with.
The model learns latent (hidden) features that describe both users and items (in our example, movies). These features aren't directly interpretable like explicit metadata (e.g., genre, director). They represent abstract characteristics inferred from the rating patterns.
Here, we are working with a pre-trained collaborative filtering model to recommend movies to users.
The steps are based on the Colab environment setup.
Let's begin by installing and importing all the necessary libraries. I have consolidated them in one place for clarity.
Next, let's load our model and dataset into our environment. Here, we are using a collaborative filtering model: https://huggingface.co/emno/movie-recommender-collaborative-filtering.
This model was trained following the steps mentioned in Collaborative Filtering Movielens using the Movielens dataset.
The key difference is that we used a subset of the dataset and a smaller embedding dimension to keep the model and embeddings lightweight for this example.
The emno model dataset contains the below files:
- movie_embeddings.csv: Contains the movies along with their embeddings and metadata.
- movie2movie_encoded.json: Contains the mappings of the original movie IDs to sequential IDs that were used for the model training.
- user2user_encoded.json: Contains the mappings of the original user IDs to sequential IDs that were used for the model training.
- ratings.csv, movies. csv and links.csv: These are the files from the original dataset from Movielens.
Creating a collection
Before getting started, if you haven't already, sign up for a free emno account.
Also, generate an API Key from the dashboard and copy it. We need it to work with the emno APIs.
Inserting the embeddings
In our collaborative filtering model, 'embeddings' are key. They are dense, low-dimensional vectors that represent users and movies. During training, these embeddings are optimized to predict user interactions, like movie ratings accurately. Specifically, a user's embedding encapsulates their preferences, while a movie's embedding is shaped by the collective ratings it receives from different users. This process allows the model to capture subtle patterns in user behavior and movie characteristics.
Now, let's insert the movie embeddings into our vector database. We'll use the movie_embeddings.csv file to load these embeddings.
Defining a method for semantic search
Here, we are defining a method for performing semantic searches on emno.
Item-based Collaborative Filtering
So far, we have extracted and inserted the movie embeddings from our pre-trained collaborative filtering model into our collection. These embeddings represent movies in a learned feature space.
Next, using a sample movie from our dataset, let's see the recommendations we get for similar movies using our system.
Let's begin by printing out the details of our sample movie so we can understand the results better.
Next, let's get the embeddings of this movie from our model:
Now, using item embedding, we perform a semantic search in emno, looking for movie embeddings that are most similar to this movie embedding using cosine similarity.
Interpretation of Results
When we use a movie's embedding to find similar movies, the semantic search finds movies that are "close" to this movie in the latent feature space. This closeness is based on how users have interacted with these movies, not necessarily on explicit content(like genres, plots, directors, etc.).
So, two movies could be deemed similar based on user preferences, even if they differ significantly in content.
For example, suppose this method considers Movie A and Movie B similar. In that case, it suggests that users who liked Movie A are likely to rate Movie B similarly based on how other users have rated these movies. This similarity is drawn from the collective user rating behavior rather than specific movie attributes.
Example Scenario
Imagine a scenario where a user likes 'Inception,' a complex, narrative-driven science fiction movie. Our system, using item-based collaborative filtering, may recommend 'The Matrix,' another film that, while different in story and style, often appeals to the same audience that enjoys intricately plotted, cerebral sci-fi films. This recommendation is derived from the observation that both movies share a similar audience profile in terms of ratings and preferences despite their distinct content features.
Conclusion
Item-based collaborative filtering in recommendation systems is a powerful tool for finding movies that share similar user interaction patterns. This approach goes beyond traditional content-based methods, offering recommendations based on collective user behavior and preferences. As a result, users are introduced to movies that might be different in content but align closely with their viewing history and preferences.
Personalized Movie Recommendations using User Embeddings
Next, we use another technique centered around user preferences to recommend movies to a user.
For this, we obtain the user embedding from our model. This user embedding represents the user's preferences and tendencies in the same latent feature space as the movies. Here is how we do it:
Before we get the recommendation results from our system, let's print the top 10 movies our sample user has rated highly in past interactions so we can better understand the results.
Next, using this user's embedding, we perform a semantic search in our vector database, looking for movie embeddings that are most similar to this user's embedding using cosine similarity.
We also filter out the movies the user has already watched from the search results.
Interpretation of Results
The result of this search is a list of movies whose embeddings are closest to the user's embedding. These movies are considered to be aligned with the user's preferences based on their interaction history (such as ratings given to other movies in the past).
The logic here is that if a user's embedding is similar to a movie's embedding, it suggests that the user is likely to have a preference or inclination towards that movie. This is because both embeddings are situated in the same feature space where proximity indicates similarity in preferences or characteristics.
Example Scenario
For example, if a user has highly rated several sci-fi and action movies, their user embedding will be closer to other movies in the sci-fi and action genres or movies liked by similar users. Thus, the semantic search will likely return recommendations for sci-fi and action movies.
Conclusion
By conducting a semantic search using a user's embedding against a database of movie embeddings, we are effectively finding movies that align well with that specific user's preferences. This method can provide highly personalized movie recommendations based on the learned patterns of user behavior and movie attributes from our collaborative filtering model.
Final thoughts
Our example has guided us through the inner workings of collaborative filtering, emphasizing the role of embeddings in representing user preferences and movie characteristics. We've observed how these embeddings are used to predict user-movie interactions, forming the foundation for personalized recommendations.
The seamless integration of vector databases in managing and querying these high-dimensional embeddings highlights their importance. By enabling fast and accurate retrieval of similar items, vector databases prove essential in scaling recommendation systems to handle large and complex datasets. This blog has illustrated the practical application of these concepts, showcasing how vector databases contribute to the development of efficient and scalable recommendation systems.
To access the complete code, check out our Jupyter notebook, where we walk through each step. We're excited to see what you create and would love for you to share your insights with us!