Please see my recent AIIM post on this topic as well.
The use cases for “visual information retrieval” span a simple search of pictures on the Internet to recognizing the faces of hooligans at the entrance of a high risk football match, monitoring airports with surveillance cameras, and investigating evidence of child pornography.
Most of the applications capable of this work are highly specialized and require specialized knowledge and experience to work effectively.
However, I expect that in the next 1-5 years, real visual information retrieval will become a core component of in-house Enterprise Information Management systems as more and more information consists of pictures and videos that are not annotated and are therefore hard to find.
Non-Text Based Data is Growing
More and more electronically stored information (ESI) is non-text based or does not contain any searchable text components (eg. sound recordings, video and pictures). This type of data is growing exponentially in size and a growing number of collaborative and social network applications support (only) these information formats. In addition, an entire generation of people uses social networks and other new media forms for communication and collaboration—exclusively—as opposed to emails and letters. How will this new reality of “visual information” impact eDiscovery and information management?
Search Challenges in Visual Information Retrieval
Most applications are either too specialized for realistic adoption in eDiscovery and information management initiatives, or they simply aren’t able to handle the significant challenges posed by visual information. ZyLAB’s technology, however, is able to overcome such obstacles.
- Electronic files containing one or more text components or embedded objects with text components can be searched by using text-based queries.
- Document scans (images) and even pictures can be enriched with the text of the original document or even with recognizable logos in the pictures. The same technology can also be applied to video shots.
- Audio and the audio component of a video file can be processed by a phonetic search engine and users can search the content by looking for specific words or phoneme sequences.
- In addition, audio, pictures and video files can be searched on contextual information such as the file name, added meta-information or text that surrounds the picture or the video on a web page.
But the largest challenge is that it is not possible to search a picture or a video on its content.
Web search engines such as Google, Bing and Yahoo! use primarily “contextual text” information from pictures and videos to search on these objects. This text can be tagged by users or can be found in the file name, file location, surrounding text on the web page, etc. For example, stock photo libraries are searchable because of the keywords that administrators have used to describe the image.
In some cases, Optical Character Recognition (OCR) technology will spot words appearing within and videos, or it could even locate and filter nudity, but that is about the extent of the support. There is little to no influence from pure visual information retrieval technology such as “give me all outdoor pictures or all images with a helicopter in it”.
Additional Challenges in Visual Information Retrieval
There are a number of additional challenges in visual information retrieval that are related to the various input formats of files, internal encoding and compression (a.k.a. Codex for video), the query format (one can use a sample image or a text description of what they are looking for), the result list format (text-based or visual-based result navigation with thumbnails and video summaries) and the viewer that is used for the image and video files.
State-of-the-art visual search technology should address all of these aspects and support both text-, image-, or video examples based querying, result navigation, and viewing.
Various Codecs
Various image input formats are supported by the ZyLAB Platform, but for proper video support, one requires one of many Codex engines in order to view the video. In some examples, video is treated as a “set of images” without taking into account the proper temporal relationships. Others have a more thorough and complete internal representation, allowing for faster and more accurate viewing and navigation.
The best approach is to convert all videos and images to one common format with the same dimensions, codec and compression. Only then can extracted image features be compared properly. There are a number of open source standards to accomplish this. Most vendors use the same open source LIBAVCODEC libraries form the FFMPEG project.
Enormous File Sizes
Images and videos, in particular, can be of enormous file size. Twenty gigabytes for a video file is common. As a result, processing the data often requires specialized hardware with very fast and large hard disks and special graphical processing power. Viewing files requires smart streaming techniques to prevent bandwidth overload.
There are many open source solutions available to resolve these problems and many vendors use the same open source libraries.
Browsing Video and Images
A result list as it is used in text-based information retrieval does not work for searching images and video; When searching images and videos, the best result is almost never on the #1 position. It is even possible that it is not among the top 10 in the result list! Ranking images is based on complex statistics and other mathematical properties that are not always intuitive. Users need a much more exploratory and visual result list that uses all available search dimensions (such as color, shapes, locations, people, and other available properties and features) when searching images and videos.
An example of a well defined video or image result list is shown hereunder (University of Amsterdam Forkbrowser, http://www.science.uva.nl/research/mediamill/demo/forkbrowser.php).
Summary
ZyLAB software is among a short list of products that are capable of supporting visual information retrieval and doing so in a manner that supports the work of eDiscovery and information management.
Did you like this? Share it: