Text-mining proves that the vocabulary of rappers is significantly richer than that of the Backstreet Boys or Britney Spears
Natural Language Processing (NLP) is a computer science discipline that analyzes and extracts information from texts. Sentiment Analysis and Emotion Mining are sub-fields of this discipline. Sentiment analysis examines whether a text has a rather positive or negative undertone. Emotion Mining analyzes how much of a specific emotion the text expresses (e.g. fear, joy, sadness, etc.).
Sentiment Analysis and Emotion Mining have many possible commercial applications like helping customer care services and recommending music or movies to online shoppers. In (criminal) investigations, locating emotional communication based on key words expressing anger, cursing or threats can be a good starting point to find out what people want to cover up or hide.
The Data Science R&D department of ZyLAB has close connections with the Department of Data Science and Knowledge Engineering of the Maastricht University. Both teams conduct research in the fields of intelligent search, information extraction, topic modeling and machine learning.
In his project “Sentiment, Emotion & Vocabulary Analysis on Music Lyrics”, Thomas Vrancken conducted a sentiment and emotion analysis on 57 651 songs from 643 artists. For each artist, he assessed a positive and negative sentiment score, as well as a score for each of the eight main emotions joy, sadness, anger, fear, trust, disgust, anticipation, surprise. Both analyses used a tf-idf algorithm.
In addition, Vrancken conducted an analysis to determine the wideness of each artist's vocabulary, based on the number of different words the singer uses.
The top rankings of sentiment and emotion scores are made of artists that were expected to be there.
Surprisingly, there is a strong positive correlation between joy and sadness. Apparently, artists who sing a lot about joy also express a lot of sadness. Intuitively, these results make sense. Artists (especially pop artists) that sing a lot about happy feelings also tend to produce more melancholic songs about love and heartaches. Conversely, most rock and rap artists reflect neither of these emotions in their songs.
Less surprisingly, these results show a negative correlation between joy and anger, but a strong positive correlation between anger and fear. The other correlation seem less relevant. These results statistically now prove this relation
Looking at the ranking for the vocabulary score, one can identify some artists that are known to be quite lyrical (e.g. Eminem, Wu-Tang Clan, etc). They also confirm the notion that hip-hop artists in general have quite a wide vocabulary.
However, the breaking ground results came from calculating regressions and correlations between these scores. These showed that: (1) Artists who express more negative sentiments than positive also tend to be more lyrical, i.e. have a wider vocabulary. (2) Artists that express joy and/or sadness tend to be less lyrical and use a much smaller vocabulary. (3) Artists that use words associated with rap music are more lyrical and have a significantly richer vocabulary than artists that use words linked with pop such as Backstreet Boys or Britney Spears.
These results bring statistical proof of many theories about music. For instance, that pop-artists do not bother to develop rich lyrics, whereas hip-hop artists do. By now, you probably wonder where Justin Bieber stands on this list? Well: very close to bottom of the list with artist with the smallest vocabulary.
Thank you Thomas Vrancken for a great research project!