Currently, we are actively looking for graduate students and interns, who are looking for paid and interesting research projects. The internships or graduation project will allow you to learn how data science technology is applied in commercial and government organizations for mission critical applications and at the same time execute thorough scientific research.
The next internship/graduation project is available:
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP).
A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context. Information Extraction is the part of a greater puzzle which deals with the problem of devising automatic methods for text management, beyond its transmission, storage and display.
Recognizing Named Entities such as PERSON, COMPANY, LOCATION, ORGANIZATION, or DATE is more or less a solved problem. However, automatically linking such entities to identify more complex relations is not a trivial task. In this project we would like to use machine learning techniques (both traditional Support Vector Machines, but also Deep Learning models), to extract complex relations between entities, primarily for fraud-, and criminal investigations on large electronic data sets such as email, social media and open source intelligence.
The key challenges in this project are the ability to deal with new types of named entities (e.g. names that are unique and that the system has never seen before) and dealing with ambiguity of names that can different meaning (e.g. differentiate between Mr. Holland; Holland, Inc; Noord-Holland and Holland Ice Skating Organization).
The system has to use annotated data sets in various languages to learn how to identify and extract such relations. The system needs to approximate human performance (70-80% precision and recall).
- What are the best machine learning algorithms to teach a system to recognize and extract complex relations between entities.
- What are the best features to use for the machine learning.
- How good is the performance of such systems (in terms of precision, recall and F1-values).
- What relations can be taught and how does the performance between different types of relations vary?
- What post and preprocessing methods can be used to increase the performance of the system?
Other Expected / Desired Outcomes
At ZyLAB R&D, first a prototype is developed using a very basic approach to set a base line performance. Next, the objective is to use novel methods such as advanced machine learning methods (deep learning, better feature factories, better document representation methods, etc.) to create a better performing system.
ZyLAB has several data sets to train and validate the performance of such systems.
More information on other projects can be found here:
Depending on your programming skills, ZyLAB will pay you an internship fee which is significant higher than what most other companies would pay you. We hear from our students that it is often 2—3 times higher than with other organizations.
In addition, we will reimburse your travel cost and you can participate in all activities we organize for our employees.
- Mandatory internship from your university.
- BSc and MSc students from Dutch or Belgian Universities
- Fields related to data science such as Artificial Intelligence, computer science, text-mining or data mining.
- Excellent programming skills in C#
- Able to work in Amsterdam minimally 1-2 days a week
If you are interested, please contact us at firstname.lastname@example.org, or leave your details on this page.