What we’re thinking about

Insights, news, and tips from our top tech and business innovators.

A brighter eDiscovery future thanks to state-of-the art algorithms

Jan Scholtes | February 8, 2018

To no one’s surprise, Legal Week18 was all about Artificial Intelligence. AI in the legal industry currently is The Big Hype. The opening keynote by ALM’s Nicholas Bruch and Steve Kolavan explained why there is so much interest to automate more parts of the legal process using AI. From 1995 to 2007 alone, the hourly rate of lawyers doubled! How much this can hurt, you can read in the recent Wall Street Journal article "Legal Fees Cross New Mark: $1,500 an Hour”.

As a result, alternatives like in-house legal departments or outside service providers have grown with double digits. Using AI clearly is another and very effective alternative to consider when you want to lower your legal spending.

Especially in eDiscovery, AI techniques have proven themselves. Here, scientific research consistently shows that AI is not only faster than human review, but also more cost-effective. They are also much better, so it also makes sense for law firms to use this technology to offer better quality service to their clients for a more competitive price.

 

eDiscovery with less humans and more machines

In the panel “A Day in the Life of a Futurist Jurist Empowered by Artificial Intelligence: An Ethical Dilemma” Ralph Losey, principal at Jackson Lewis, Martin Tully, litigation partner at Akerman and others discussed a future of e-Discovery in 2048 with less humans and more machines.

Especially in eDiscovery, the continuous growth of data (referred to as Terrible Bytes by one of our customers) does not leave us any other choice than to fight what machine produce with machines. Certainly now the fine line between corporate data and personal data is steadily disappearing and eDiscovery data collections include more and more different shaped and formed data, stored in a vast amount of corporate and personal repositories.

Agent Smith

I fully endorse these visions and ambitions. We need to automate as much of the eDiscovery process as possible, starting with direct collection from common sources such as Office 365. We need deep processing in order to be able to reduce as much as data as possible before we present this to the reviewers. And we should use more advanced tools for early case assessment and review acceleration.

 

Start using state-of-the-art algorithms

Walking around the exhibit floor and visiting several booths of eDiscovery vendors, I was surprised to see almost all of them are still using inferior and outdated text-classification algorithms for Technology Assisted Review (TAR). These result in multiple problems and inefficiencies. I suspect not everyone knows or is able to use the better algorithms in the ever-growing data sets. For instance:

  • For many years, we know that Non-Negative Matrix Factorization (NMF) is a much better alternative to algorithms such as k-NN, Latent Semantic Indexing (LSI), the probabilistic variant named PLSA and even LDA (which allows certain mathematical operations in clustering that makes no sense when dealing with text) for concept search.
  • Also, using linear regression for TAR requires highly balanced document sets and is very sensitive to wrong training data compared to for instance Support Vector Machines.
  • The Early Case Assessment (ECA) solutions that I viewed are very limited and just Early Data (volume and type) Assessment tools. Tools used in the intelligence industry, such as topic modeling, event detection, community detection let alone anomaly detection, are still completely absent in today’s eDiscovery tools.

When aiming for more automation and intelligence in eDiscovery, we should use state-of-the-art algorithms simply because their quality is so much better. Quality is paramount in order to get user acceptance. Especially now that eDiscovery becomes even more complex with new requirements such as the GDPR.

When computational complexity (such is the case in Deep Learning) is still a bit too much for today’s practical application, we could for instance, use find smart solutions to compress our data using semantic techniques. And there are many more solutions.

So; the good news is that there still is lots of room for improvement. Looking at today’s excitement in LegalTech while many vendors are still using outdated algorithms, I see a bright future for AI and automation in eDiscovery. Our industry just needs to explore state-of-the-art algorithms that are so successfully used in other applications.

Written by Jan Scholtes

Connect with Jan Scholtes on LinkedIn or