Data Science Internship at Van Gogh Museum
The Van Gogh Museum makes the life and work of Vincent van Gogh and the art of his time accessible and reaches as many people as possible in order to enrich and inspire them.
The van Gogh museum is based in Amsterdam, the Netherlands and has been using the ZyLAB ONE platform to manage various collections of background documentation. This includes but is not limited to: background information on van Gogh paintings and drawings, documentation of special exhibitions, correspondence, personal collections from van Gogh specialists, newspaper clippings about the van Gogh museum and van Gogh’s work, etc.
For who
This internship is best for computer science students looking for an internship or MSc graduation project in the fields of text-mining, information retrieval, artificial intelligence, data science or machine learning.
Location
The internship location is the Van Gogh Museum in the center of Amsterdam.
Problem
This particular document collection consists of scanned documents and corresponding meta-information. The meta information may not be complete or can be inconsistent. The van Gogh museum is interested in applying advanced artificial intelligence, data science and data visualization techniques to verify the quality of the meta information, to identify anomalies and to organize and clean up the archives. In addition, the museum is interested in integrating the content of the textual documents with other sources or information which resides in other content management systems. Finally, the museum wishes to understand how advanced information extraction techniques can provide new insights in the history of and relations between individuals, locations, organization and works of art of van Gogh and others.
All this requires a thorough understanding of the ZyLAB ONE software, but also of modern artificial intelligence, data science, text-mining, artificial intelligence and data visualization techniques algorithms.
Key challenges
The key challenges in this project are that text from some of the material is based on low quality document scans, these have resulted in a lower quality text generated by using Optical Character Recognition (OCR) tools. In addition, the documents are highly unstructured and very different in format.
In addition, individual names (especially those transliterated from non-Roman spellings) have resulted in many different spelling variations, as have historical spelling variations.
The information extraction and machine learning methods used therefor have to be robust against such OCR errors, misspellings, transliteration and historical spelling variations.
Research Questions
The research questions in this research are:
Other Expected / Desired Outcomes
The methods used have to interact with all the information (including all metadata) in the ZyLAB platform via the ZyLAB API’s, now and in the future.
More Information
More information on R&D projects which ultimately find their application in the ZyLAB ONE platform can be found here: https://textmining.nu
Contact
If you are interested, please contact us at hrm@zylab.com, or leave your details on this page.
© 2019 ZyLAB Technologies