Spotify's AI Algorithm
Published by Sid Chadha on July 12, 2020
As Machine Learning and its applications expands, it is being used increasingly by music streaming platforms. Perhaps one of the biggest music streaming platforms that use these algorithms is Spotify, a music platform that hundreds of thousands of K-12 students listen to everyday. Due to its immense relevance with the K-12 audience, it is critical to cover how this algorithm work that we all use so often.
In the early 2000's, a recommendation system was built by a firm called Songza that would manually create playlists for users. This system was fair, gaining a similar amount of positive and negative reviews. The lack of personalization prevented the system from gaining popularity. The technology just didn't exist at that point for this type of advanced system. A few years later, Spotify then built their own algorithm and recommendation system, which is the one we all know today.
The biggest application and example of the Spotify algorithm is the home screen we view when we open the app/web browser. The Spotify research director, in a presentation says that the home-screen of the Spotify app helps to show how algorithms govern a listening experience for a given user. The artificial intelligence algorithm helps users find music "best for them" as quickly as possible.
The algorithm behind the Spotify home screen is known as BaRT or Bandits for Recommendation as Treatments. This algorithm displays a certain user's home screen in a way that the algorithm predicts is best for the user. This helps to create certain rows of playlist (known as shelves), such as "Discover Weekly", which displays playlists created by Spotify that the algorithm predicts you, as a user, would enjoy the most. The "Discover Weekly" playlist is renewed with 30 songs each Monday that the user may have not heard previously, but may enjoy based on certain data inputs described below.
My Spotify Discover Weekly Playlist
As we've discussed throughout this website, machine learning consists of inputs and outputs. This algorithm works through looking at four vital inputs to build the best possible machine learning model.
The first tool is called Collaborative Filtering, which was originally used by Netflix. Netflix used movie ratings that a user gave (on a scale of one to five stars) to recommend movies that are similar to the movies that are being rated better by the user. This method of analysis then spread to numerous industries and companies, including Spotify, who uses this method through taking implicit feedback (instead of explicit, like what Netflix did). Implicit feedback, in Spotify's case, essentially involves how many times a user clicked on the page of a given artist or listened to a song on repeat.
But Spotify doesn't determine optimal songs for the user just based on the user. It also scans the internet and social media, using natural language processing or NLP, to read for keywords that describe a given artist. This keyword will then be given a weight, based on how often this keyword was mentioned. If a user listens to an artist who has a keyword associated with him/her, Spotify could recommend another artist who has similar key words associated with him/her. Below is an example of how NLP gives weightage to certain key words on certain artists/songs across the internet.
The third type of data Spotify analyzes is the demographic and geographic of a given user. Furthermore, Spotify analyzes the age, gender, location, and if the user moved in this type of data. This helps create a personalized playlist for the user.
The last type of data Spotify analyzes is the raw audio the user listens to. This type of data analysis takes into account any type of song, even if it has 50 listens, and analyzes it. It uses convolutional neural networks (the neural network used for facial recognition and images) to do this. The audio goes through the convolutional layer (most likely using a ReLU activation function) and gets pooled in the final layer, computing certain features regarding the song (similar to getting information of certain parts of an image through pooling). The output then predicts features such as how high the tempo is or if its acoustic. Below to the right is an example image of how the Convolutional Neural Network works. You can learn more about how Spotify's Convolutional Neural Network works here (layers, loss metric, hyperparameters).
Training the Model
As discussed throughout this website, algorithm's have to keep adjusting themselves based on past outputs. Spotify's algorithm adjusts based on numerous components. This algorithm tracks that it is successful when the user streams a recommendation for more than 30 seconds and when a user streams this song longer, the better the recommendation is, in Spotify's point of view. A former executive from Spotify said that listening to a song for less than 30 seconds was equivalent to a "dislike". Spotify was able to determine it can then adjust the importance or weight of certain factors, such as tempo in the songs the user listens to or their geographical location, to provide optimal recommendations throughout the app.
Example structure of Spotify's algorithm for the Discover Weekly playlist
Utilizing user listening data coupled with external data, such as social media and internet data, to create a machine learning model that has a huge success rate is what has helped Spotify remain on top of the music streaming industry,