Currently, we are actively looking for graduate students and interns, who are looking for paid and interesting research projects. The internships or graduation project will allow you to learn how data science technology is applied in commercial and government organizations for mission critical applications and at the same time execute thorough scientific research.

The next internship/graduation project is available:

Problem

ZyLAB wishes to develop a new approach to create data visualization for eDiscovery, Answering Regulatory Requests, and Criminal and Fraud investigations, that:

  • Constructs an interactively operable, dynamic, effective visualization environment, and
  • Applies means of text-mining, natural language processing and machine learning to enrich and organize textual data thereby increasing data value.
  • Automatically detect anomalies in such data sets to notify users of such anomalies in both textual data as well as in large data visualizations.

In order to address this problem, we combine principles of effective visualization and navigation with known principles of criminal investigation to create intuitively operable software for this domain. The widely used investigation model of the “Golden Ws” is embedded in a visualization environment for data retrieved by means of text mining and AI. By answering these questions, the investigator builds a framework around the examined subject and creates a story about the incident or environment. This story helps to comprehend, draw conclusions, and find relations to other circumstances.

The “Golden Ws” model defines five to seven questions which address the main components of an incident, circumstance of environment:

Question

Entities or patterns to address this question

Potential Visualization

Who is it about?

PERSON, COMPANY, ORGANIZATION. EMAIL ADDRESS

Pie Chart, Bar Graph

What is it about?

Result of Topic Modeling (NMF) or Document-Term Correlation Matrix (A*AT)

Word cloud, Word wheel

When did it happen?

DATE, TIME, MONTH, DAY WEEK, YEAR

Time line with bar graph

Where did it happen?

ADDRESS, CITY, COUNTRY, CONTINENT, DEPARTMENT and other geo-locations

Geographical Mappings

Why did it happen?

Sentiments, emotions and cursing

 

How did it happen?

Custom patterns to recognize events, holistic OBJECT-PREDICATE-SUBJECT and RDF extractions

Relation graphs

How much/often did it happen?

Quantitative measures such as amounts, currencies, and other numbers. Also frequency and averages on entity occurrences.

Bar graphs

Information Extraction

For each W-question, a number of extracted entities, emotions, or sentiments are extracted before they can be used in clustering or data visualizations.

Combining Who, When, Where, Why, What, How and How Much

Even more interesting is to combine the W’s. For instance, why not look for Who is Where, or What happened When. A large number of variations can be generated in combinations with just as many different cluster or other analysis algorithms such as Non-Negative Matrix Factorization, Hierarchical Clustering, (Eigenvalue) Modularity, etc. For each, a specific data visualization or report can be used.

In this project, we would like to focus on the WHO-WHO and the WHAT-WHEN questions.

The objective is to develop an extension for the ZyLAB ONE eDiscovery platform which runs on the Microsoft Azure platform and provides C# API’s and uses the Angular 6+ HTML-5 framework.

Research Questions

The following research questions have to be answered in this project:

WHO-WHO

  1. What are available algorithms to provide insight into the WHO communicated with WHO question.
  2. What are available WHO’s and how can these best be extracted and normalized by using text-mining.
  3. What additional algorithms are required to determine communities.
  4. How can the system detect anomalies in the data?
  5. What post- and preprocessing methods can be used to increase the performance of the system?
  6. What are the best features to use for the machine learning.
  7. How much time does it take to process the data?
  8. How can we ensure that one machine learning model can be used on any dataset?

WHAT-WHEN

  1. What are available algorithms to provide insight into the WHAT happened WHEN.
  2. What are the best methods to create a dynamic (in time) representations of the documents. Which WHENs are available and how can these best be extracted using meta data analysis or text-mining.
  3. How can the system detect anomalies in the data?
  4. What post- and preprocessing methods can be used to increase the performance of the system?
  5. What are the best features to use for the machine learning.
  6. How much time does it take to process the data?
  7. How can we ensure that one machine learning model can be used on any dataset?

Other Expected / Desired Outcomes

At ZyLAB R&D, first a prototype is developed using a very basic approach to set a base line performance. Next, the objective is to use novel methods such as advanced machine learning methods (deep learning, better feature factories, better document representation methods, etc.) to create a better performing system.

Development Environment

At ZyLAB, develop is done in C# in combination with HTML-5 based on the Angular-6+ framework.

Data Sets

ZyLAB has several data sets to train and validate the performance of such systems. More information on other projects can be found here:

https://textminingum.wordpress.com/

Compensation

Depending on your programming skills, ZyLAB will pay you an internship fee which is significant higher than what most other companies would pay you. We hear from our students that it is often 2—3 times higher than with other organizations.

In addition, we will reimburse your travel cost and you can participate in all activities we organize for our employees.

Requirements

  • Mandatory internship form your University
  • BSc and MSc students on Dutch or Belgian Universities
  • Fields related to data science such as Artificial Intelligence, computer science, text-mining, and data mining
  • Excellent programming skills in C#
  • Able to work in Amsterdam minimally 1-2 days a week

Contact

If you are interested, please contact us at hrm@zylab.com, or leave your details on this page.