More like this: Recommending similar Netflix shows with AWS Personalized | by Vanessa Lam | November, 2022

Step-by-Step Guide to More Customized Movies

Unsplash. Photo by Glenn Carstens-Peters on

With a vast collection of shows at our fingertips, browsing for new content can be a fun exploratory process – but it becomes exhausting when we search for too long without landing on anything of interest. Well-implemented recommendations can fill this gap, suggesting items based on our activity history and how other people interact with similar content.

Such recommenders can be self-built with tools like Sagemaker and TensorFlow, or we can use plug-and-play options that can be implemented almost instantly.

In this project, I extracted data on Netflix shows from trailer playlists on Netflix’s official YouTube channel and used AWS Personalized to recommend similar titles based on user interaction patterns in the comments left on each video.

Table of ContentsPart 1: Preparing the Data
Part 2: Generating the Recommendations
Part 3: Results

The input data is comments and video data from the Netflix Trailers playlist, which is supported by the YouTube Data API v3. This API is free to use, but usage quotas exist. So the data was extracted over multiple dates in September-October 2022.

Track API usage levels on the cloud console

The YouTube Data API provides access to multiple resource types and methods. Here were the ones to be used:

  • playlistItems.list To get list of all video ids in trailer playlist
  • videos.list To get the details of each of these videos (title, description, upload date, etc.)
  • comments.list To get comment replies/sub-level comments
  • commentThreads.list To get a list of main/top-level comments on each video

This generated a dataset of 1,181,917 observations from 1,254 videos.

Subsequently, this raw data is cleaned up and formatted based on AWS Personalized Schema requirements. It also includes:

  • Convert time data field to UNIX timestamp format
  • Parse the video title to remove irrelevant words like ‘trailer’, ‘netflix’ etc.
  • Any unstructured text field had to be enclosed in double quotes, and formatted to add escape characters. / remove double quotes within value

Areas that were not used or cleaned properly were discarded at this step, but are available in the raw data for future iterations. Here is the final dataset:

With the formatted data ready, now on the AWS console to make recommendations!

Both files are first uploaded to S3 Bucket in CSV format:

Permissions and access have to be set up as well – create an IAM role for AWS Personalized and add an S3 bucket policy to allow personalize read and write.

Next, we can start building the recommender from AWS Personalized Services:

To get started, create a dataset group and select a use case domain. Provides various recipes including personalized user segmentation, personalized ranking and related items. For this project, I am using a similar-items Recipe under “custom” domain.

Import the data to personalize it by specifying the S3 location where the data was uploaded and providing a schema that matches the imported data.

After the dataset is created and imported successfully, create a solution and select the recipe type to be used (aws-similar-items,

With this setup, we can get recommendation results through campaign or batch – Here, I am using batch job to get results for all videos. Provide the input data (list of video IDs formatted in JSON file) and output directory (the folder where the results will be written) to start the batch job:

Once the batch job is complete, the recommender results should be available in the output s3 location provided. By default, twenty-five recommendations are generated for each configurable item in a batch job.

After some formatting and cleaning of the output data, here are some examples of similar item recommendations generated:

At first glance, there are obvious similarities between some of the recommended titles. To all the boys: forever and ever Connected with other rom-coms (hard love, That is all, princess switch, Alice in Borderlands As with other video game and manga-inspired action shows (DOTA, Castlevania, yasuk), And tiger king Along with other investigative documentaries (our father, Mute, Anna’s Invention, a boy called christmas Also returns various Christmas-themed movies such as single all the way, castle for christmasAnd Imaginary dwarfs,

However, for others, the link is not so clear. For one, what could be selling sunset have in common with cuphead show What attracts the audience?

Highly popular recent titles were also featured in disproportionate numbers – half the shows on our playlist were recommended <10 times, but gray man Was recommended a total of 370 times alone!

Overall, the initial results look promising enough, but it will require far more fine-tuning to be ready for production deployment. Some areas to explore for possible next steps include:

  1. Adding item metadata: This solution was based entirely on interactions (comments), and barely any item metadata was included. Additional information such as cast, style, and description can greatly improve the relevance of the result.
  2. Recommendation for items with no interactions: Some shows have little or no interaction data, i.e., new titles experiencing cold start problems or special cases – in this dataset, the number of comments is 0 For children’s shows, because comments have been disabled for children’s content.

What other areas might be worth visiting? Let me know if you have any suggestions!

Thanks for reading.

Leave a Comment