What we’re thinking about

Insights, news, and tips from our top tech and business innovators.

Lessons learned from 35 Years of Legal Searching

Jan Scholtes | November 27, 2018

IdeaDespite the fact full-text searching has been around since the 1960's, there remain many challenges in the field of legal searching technology to overcome. Search engines such as Google have created the general perception that we can search for anything, but this technology is driven by popularity, which is not suitable for legal searching. Not only do search results vary from one search engine to the next, when it comes to legal searches the exact words or phrase required may not even be known.

For legal searching to be truly effective, it needs to produce results which reliably pass the following tests:

  • accurate and comprehensive;
  • able to deal with incomplete data sets;
  • efficient;
  • able to find unknown unknowns;
  • Legally defensible;
  • Transparent;
  • Ethical.


The historical context of e-Discovery in the Legal Industry

 Document searching technology has evolved considerably since its inception, as outlined below:

1980s: ZyLAB sold 600,000 copies of ZyINDEX in the early 1980’s, a full-text search program for electronic files.

1990s: ZyLAB introduced ‘fuzzy’ search into its ZyINDEX product. This early technology is still used today and enables searching of words whereby the precise form and spelling is not initially clear. During this time, ZyLAB solutions were implemented thousands of times, including by FBI and various UN War Crimes Tribunal.

Late 1990s – early 2000: the introduction of e-Discovery for law firms. This software enabled basic and fuzzy searching for single words or phrases within OCR’d documents uploaded to a litigation support database. At this time, all e-Discovery work was done principally outside of law firms.

2005 – 2010: stored data became searchable whether in the form of metadata, databases, or embedded data. It was now considered essential to collect all possible information for a given case, due to fear of missing something important. This, in turn, led to the use of testing and sampling techniques to save time and refine the searching process, due largely to the sheer size of the datasets available for searching. The prevailing belief of those specializing in legal searching was that “search alone is no longer good enough”.

In direct response to this concern, new technologies, based in machine learning were developed, including text classification and machine learning to overcome the limitations of searching within very large datasets or messages. But it remained hard to get machine learning-based search accepted by the legal community.

2012 – 2014: This was a landmark period whereby the use of ‘assisted review’ was approved the Courts. In addition, auto-classification was introduced, and e-Discovery protocols were refined (i.e. continuous active learning).


The present day in the world of e-Discovery

While legal searching has developed considerably, there is no doubt it still requires improvement to fully meet the tests outlined above. It is now widely understood that the guiding dimensions of who, where, when, why, what, how, how much, and by which means, (the Ws) should be applied during the e-Discovery process in all legal cases whereby e-Discovery is required. And to achieve this objective we now have new powerful tools, including data analytics (structured/semantic), data enrichment, data visualization and anomaly detection.

Assisted review techniques are now enabling law firms to find what they didn’t previously know to look for, by using technologies such as faceted browsing and customizable visual analytics. Other new methods include automated document topic/concept grouping and subgrouping modelling, which ultimately helps unearth the ‘W’s’, and hence more relevant documents.

One particularly effective way of leveraging analytics and visualization for this purpose is through the use of dashboards, such as categorization/facet dashboards. Such views very quickly and automatically expose the patterns of underlying data, leading to new insights even before proceeding with in-depth e-Discovery, hence strongly aids decision making.


The future of e-Discovery

We live in amazing times, but we are on the cusp of new technologies which will further transform e-Discovery.

The biggest shift we are likely to see shortly is in the field of autonomous e-Discovery searching, which will enable the automatic and self-learning organization of information. Other new concepts which will gain wide-spread adoption include:

  • Geo-mapping will be widely used to show a geographical representation of documents, and custodian relationships using cluster and node mapping;
  • Multi-media searching will improve, for example enabling you to search the contents of a stored image using probability analysis;
  • Topic stream analysis will be used to quickly visualize concepts/topics within a dataset across time;
  • Emotion and sentiment visualization – to organize entities, custodians, organizations by the interconnection of emotion and sentiments.


In summary

 The first iteration of e-Discovery taught us how to handle a large amount of data for legal searching in a manner that is defensible, efficient, accurate, comprehensive, and ethical – we believe the next generation will teach us how to learn from big data and train algorithms to provide better decision support in ways that further support the aims of comprehensive, ethical, and cost-effective e-Discovery.

To find out more about the points discussed in this article, you can listen to our webinar in which Brenda Dodd, Senior Consultant in Legal Technologies at ZyLAB, Mary Mack, Executive Director ACEDS, and Johannes Scholtes, CSO at ZyLAB talk about e-Discovery.


Available resources

Whitepapers available from www.ZyLAB.com/resources

ZyLAB ONE demo – can try many of the concepts discussed today for yourself (with 500,000 documents pre-loaded!)

Written by Jan Scholtes

Connect with Jan Scholtes on LinkedIn or