Currently, we are actively looking for graduate students and interns, who are looking for paid and interesting research projects. The internships or graduation project will allow you to learn how data science technology is applied in commercial and government organizations for mission critical applications and at the same time execute thorough scientific research.

The next internship/graduation project is available:

Problem

In criminal and fraud investigation, large collections of emails, social media or other forms of communication are analyzed. Often such collections consist of Tb’s of unstructured information. Search is no longer good enough to find the needle in the haystack. With 30 years’ experience in fraud and criminal investigations, ZyLAB has noticed that when something goes wrong in a criminal scheme, relevant communication often contains high levels of emotions, communication or cursing. Detecting such patterns is a complex task. Keyword search is generating too much noise and it is hard to capture the right linguistic context using traditional search techniques. Initial experiments using machine learning and annotated data sets has shown to be performing at par with human performance for such tasks. In this project, we would like to develop machine-learning based methods to recognize emotions, sentiments, cursing, or cynicism in various languages.

Key challenges

The key challenges in this project are the ability to deal with new types emotions, sentiments, cursing or cynicism (e.g. names that are unique and that the system has never seen before) and dealing with negations, subjectivity or ambiguity of emotions, sentiments, cursing or cynicism.

The system has to use annotated data sets in various languages to learn how to identify and extract such patterns. The system needs to approximate human performance (70-80% precision and recall).

Research Questions

  1. What are the best machine learning algorithms to teach a system to recognize and extract emotions, sentiments, cursing or cynicism.
  2. What are the best features to use for the machine learning.
  3. How good is the performance of such systems (in terms of precision, recall and F1-values).
  4. What relations can be taught and how does the performance between different types of relations vary?
  5. What post and preprocessing methods can be used to increase the performance of the system?

Other Expected / Desired Outcomes

At ZyLAB R&D, first a prototype is developed using a very basic approach to set a base line performance. Next, the objective is to use novel methods such as advanced machine learning methods (deep learning, better feature factories, better document representation methods, etc.) to create a better performing system.

Data Sets

ZyLAB has several data sets to train and validate the performance of such systems.More information on other projects can be found here: https://textminingum.wordpress.com/

Compensation

Depending on your programming skills, ZyLAB will pay you an internship fee which is significant higher than what most other companies would pay you. We hear from our students that it is often 2—3 times higher than with other organizations.

In addition, we will reimburse your travel cost and you can participate in all activities we organize for our employees.

Requirements

  • Mandatory internship form your University
  • BSc and MSc students on Dutch or Belgian Universities
  • Fields related to data science such as Artificial Intelligence, computer science, text-mining, and data mining
  • Excellent programming skills in C#
  • Able to work in Amsterdam minimally 1-2 days a week

Contact

If you are interested, please contact us at hrm@zylab.com, or leave your details on this page.