What we’re thinking about

Insights, news, and tips from our top tech and business innovators.

Finding the 253 individual spells in 7 Harry Potter books in only 25 minutes

Annelore van der Lint | June 27, 2017

To celebrate the 20th anniversary of Harry Potter, we like to highlight a Text-Mining project that was recently implemented by Markus Dienstknecht and Moritz Haine from the Department of Data Science and Knowledge Engineering of the Maastricht University: spell extraction from the iconic seven Harry Potter books.

Objective of the project was to extract all spells that were used in the Harry Potter books and assign them to the characters who used them. By using text-mining techniques to answer the W-questions (Who cast a spell? What did a character cast? Why did the character cast that spell? What goal did he try to achieve? Whom did he use that spell against? Where did he use that spell?) the team aimed for a complete spell - character mapping with additional information.

Harry Potter Spells

 

253 different spells from 7 books found in 25 minutes

In 25 minutes, using a Core i7 machine and 16GB RAM, the computer extracted 41 different wizards, 64 different spells and 253 spells in total. The results are visualized with Gephi.
Harry Potter Spells Visualized with Gephi

The hunt for the spell

The first thing that had to be done in order to relate spells to individuals, was to extract every spell that was used. For that goal, several integrated text mining approaches were used. Preprocessing was needed to make sure that non-relevant information was removed (like Page | XX or empty lines). A manual check through all the given sentences was made to make sure all quotation marks were used correctly.

Tagging and Named Entity Recognition (NER) based on gazetteers, part-of-speech (POS) tagging and linguistic statistics were used on the separated sentences to indicate the spells and the individuals..

The final step of reducing all the words to a minimum number of words was to check the endings of all the leftover words. This proved to be effective as almost every spell is of Latin origin and the Latin grammar is straightforward and has very specific suffixes. By using this additional information, the spells could be recognized and assigned to a person and a location.

The Data Science R&D department of ZyLAB has close connections with the Department of Data Science and Knowledge Engineering of the Maastricht University. Both teams conduct research in the fields of intelligent search, information extraction, topic modeling and machine learning. 

Thank you Markus Dienstknecht and Moritz Haine for a great project!

Written by Annelore van der Lint

Connect with Annelore van der Lint on LinkedIn or