Predictive Coding in eDiscovery: How to Create an Efficient Predictive Coding Workflow

Document review: two words that can evoke dread in any lawyer who spent time as a junior associate poring through countless pages of discovery materials.

It’s easy to feel overwhelmed by the sheer number of documents that must be reviewed in a given case—and terrified at the prospect of overlooking an important piece of information. But luckily, technology has evolved to make the document review phase of eDiscovery much more manageable.

Those advancements include predictive coding, which is one of the most effective and efficient methods for completing document review. Predictive coding uses a human-informed algorithm to review the majority of documents in a legal case, expediting the review process and ensuring greater accuracy and consistency.

According to a recent survey, 29.2% of eDiscovery professionals use predictive coding in most or all of their reviews, but more—35.4%—use predictive coding infrequently or not at all. In other words, many legal teams are leaving this indispensable eDiscovery tool on the table.

What exactly is predictive coding, and why should you consider using it? That’s what this blog post will cover. We’ll also break down the challenges of predictive coding and share five best practices you can use to create an efficient predictive coding workflow. Let’s get started.


What is predictive coding?
Is there a difference between predictive coding, continuous active learning (CAL), and technology-assisted review (TAR)?
The benefits of predictive coding
The challenges of predictive coding
Five best practices for creating an efficient predictive coding workflow
Modern technology can help legal teams perform document review faster and more cost-effectively


What is predictive coding?

Predictive coding is the process of learning from past results to anticipate future results. The human brain does this naturally, but thanks to advances in artificial intelligence (AI) technology, computers can now learn in the same way.

Predictive coding is an essential part of modern eDiscovery because lawyers can use it to quickly sort responsive from non-responsive documents.

How does predictive coding work in the document review context? First, an experienced lawyer or subject matter expert analyzes a “seed set” of random documents. The AI software “watches over the reviewer’s shoulder,” using the results of that initial analysis to learn to automatically identify similar patterns in other documents.

The human review team continues its review, feeding more documents to the algorithm and testing it to ensure quality and accuracy. When the reviewers determine that the algorithm is working properly, they have the software rank the responsiveness of any remaining documents.

Finally, the review team samples the documents that the algorithm has marked as the most and least responsive to ensure that they are classified correctly (and to determine whether any of the responsive documents are privileged).

If you’re thinking this process sounds a lot like technology-assisted review or TAR, well, you’re not wrong. Let’s look at how predictive coding is related to some other well-known review technologies.


Is there a difference between predictive coding, continuous active learning (CAL), and technology-assisted review (TAR)?

Yes. Predictive coding and continuous active learning (CAL) represent two different ways that an AI can learn. Both of these methods are applied in eDiscovery, so both of them can be used in technology-assisted review (TAR). Let’s compare these terms one at a time.

First, how does predictive coding differ from CAL? Predictive coding begins with a seed set of documents and refines its algorithm over time. By contrast, CAL runs—as the name suggests—continuously. It does not need a starting seed set; instead, it operates in the background as soon as the review team begins coding documents manually.

And what about TAR? TAR is a broader term that encompasses different ways that an algorithm can learn to sort documents and assist with document review. Predictive coding and CAL are two approaches to TAR. These processes begin differently, but a legal team can use either one to review documents when the algorithm reaches maturity.

Let’s look at why a legal team would choose to use predictive coding.


The benefits of predictive coding

Predictive coding offers the best of both worlds, combining human knowledge and experience—in the form of the expert’s seed set—with machine efficiency and consistency. Using predictive coding instead of relying exclusively on manual review can help legal teams:

  • expedite the document review process by letting AI do the majority of the heavy lifting,
  • learn key facts about a case at an earlier stage,
  • meet eDiscovery deadlines more easily,
  • free up valuable time by automating a large portion of the eDiscovery process,
  • achieve greater accuracy by spotting more responsive documents, and
  • save their client or organization money by reducing the cost of eDiscovery.

Predictive coding clearly has its benefits, but it is not without its challenges.


The challenges of predictive coding

Predictive coding may be one of the best approaches to modern eDiscovery, but no method is perfect. Here are some of the most common challenges that legal teams face when it comes to predictive coding.

  1. Predictive coding sounds intimidating

The term “predictive coding” alone sounds a bit intimidating, and it can be a difficult process to understand without a background in statistical analysis or data science. This has led some in the legal community to be hesitant about using predictive coding to assist with document review. But the good news is that modern TAR platforms manage the hard part for their users. The barrier to entry is much lower now than when predictive coding was in its infancy because the technology is both more advanced and more user-friendly.

  1. Many lawyers are unfamiliar with TAR platforms

It’s not uncommon to encounter resistance where new technology is concerned, especially in the legal industry. But the various approaches to TAR are not new anymore. Courts have approved the use of TAR since the 2012 Da Silva Moore v. Publicis Group & MSL Group decision, and many lawyers have made TAR a regular part of their document review process since then.

Some lawyers, however, haven’t yet made the leap to using TAR in general or predictive coding in particular. These lawyers may still lack confidence regarding AI-driven technology or their ability to use it correctly. That’s why it’s so important to invest in user-friendly TAR tools, as we’ll discuss more below.

  1. Predictive coding requires adequate input

Contrary to popular belief, TAR platforms can’t do all the work on their own. With any kind of artificial intelligence, garbage in leads to garbage out. Predictive coding algorithms—like any approach to TAR—require accurate and informed input before they can correctly identify patterns and make connections. This means that the process must begin with a knowledgeable lawyer or subject matter expert and continue with a competent, experienced legal team. These individuals must be equipped to gather an appropriate seed set, accurately code the documents in that seed set, test the algorithm, and verify that it is working.

  1. There is a level of uncertainty around the predictive coding process

Predictive coding, like other forms of artificial intelligence, comes with a level of uncertainty. How is the algorithm making decisions? What factors is it weighing in deciding what documents are responsive? No one can say exactly, because the algorithm has taught itself. The “black box” nature of predictive coding can lead to questions not only for users but also for opposing counsel and the courts. If opposing counsel raises concerns about how the TAR platform was trained, the court will want to know exactly what steps the legal team took.

Let’s review some best practices for overcoming these challenges.


Five best practices for creating an efficient predictive coding workflow

While predictive coding can save legal teams time, these five best practices ensure that your team builds smooth, efficient, and defensible predictive coding workflows.

  1. Be diligent

The first step to mastering predictive coding is to view it as more than just a self-sustaining algorithm—rather, it is a review process that includes the algorithm. As we noted above, successful predictive coding requires legal teams to conduct appropriate initial coding, perform periodic testing, and verify results. Therefore, legal teams must assume an active, ongoing role in the predictive coding process. To that end, they should be diligent about guiding the algorithm along and resist the urge to let the software do all the work.

  1. Use keyword searches

Many lawyers mistakenly assume that using predictive coding means they can skip keyword searches. But keyword searches help to evaluate and refine the predictive coding process. Legal teams can determine whether the algorithm is working properly by comparing and cross-referencing the results of keyword searches with the results from the predictive coding algorithm. In other words, keyword searches can help train the algorithm, identify any weaknesses or areas that the initial seed set analysis overlooked, and eventually verify that the process is working properly.

  1. Document the process

Legal teams should document every step of the predictive coding process, from the initial seed set analysis to the latest quality control test. Courts may require legal teams to produce the seed set they used to train the algorithm and explain other details about their training process and their oversight of the results. Thorough documentation helps to demonstrate transparency and defensibility and can instill trust in the technology’s results. Documentation can also prevent the need to redo any part of the predictive coding process.

  1. Undergo training

Adequate training can demystify TAR technology and alleviate much of the fear and uncertainty about predictive coding. Legal teams should undergo training and familiarize themselves with how predictive coding and their chosen tools work before making TAR a routine part of their document review process. Training can also help prepare lawyers to answer any TAR-related questions that an opposing party or judge may have, addressing their concerns and increasing the chance of favorable discovery rulings.

  1. Invest in the right technology

As we mentioned before, TAR is becoming more common in the legal industry, which means there are many different AI tools on the market today. Of course, these tools aren’t all created equally. When investing in predictive coding technology, legal teams should look for a platform that is both comprehensive and user-friendly. By investing in the right technology, legal teams improve their odds of enjoying a seamless, efficient, and effective document review process.


Modern technology can help legal teams perform document review faster and more cost-effectively

Predictive coding technology can help legal teams perform document review quickly and consistently, saving time and reducing eDiscovery expenses.

For example, ZyLAB’s Legal Data Analytics platform is an advanced analytics solution that allows legal teams to create better, faster, and easier legal discovery workflows.

Legal Data Analytics uses AI—including predictive coding—to identify patterns, codewords, and outliers in legal documents, accelerating the search for responsive documents. The platform also allows users to perform quality control through document sampling to ensure accurate classification and defensibility.

For more information about ZyLAB or Legal Data Analytics, contact us or set up a demonstration today.