6 minutes reading

Fact Finding: A Guide To Managing Large Volumes Of Legal Data

Fact finding is at the very core of the discovery phase during civil litigation. Over the course of the past two decades, growing amounts of data have come to define that process. We have seen corporate data stores continue to grow. This growth is in volume (the amount of data) and diversity (the amount of formats). Both make fact finding more and more difficult when preparing for initial disclosure. In recent years, attempts to streamline civil litigation have been made; such changes often end in requiring the parties to do more work in less time. 

Of course, these problems are not new or without solutions. eDiscovery technology has improved steadily to meet the demands of legal professionals. It can now handle both the growing amounts of data and the shrinking deadlines. Unfortunately, however, the adoption of said technology has been lagging. In this article, we will discuss why modern legal fact finding needs technology to thrive. Along the way, we will also look at some of the challenges of implementing technology. We’ll conclude with the added value AI-powered tools bring to the existing discovery process. 


Why legal fact finding needs technology
The Advantages of eDiscovery
The Challenges of eDiscovery
Technology-Assisted Review and eDiscovery
- Traditional (Boolean) Search Only
- TAR with random or preselected sample


Why legal fact finding needs technology 

In many ways, civil litigation is a numbers game. In its 2020 litigation trends survey, Norton Rose Fulbright (NFR) notes many respondents expect new sources of disputes to arise in the next 2-3 years. In general, the amount of disputes respondents are involved in is growing. The most concerning types of legal disputes are found in data-heavy fields. For example, contracts/commercial litigation, regulatory investigation, and cybersecurity (page 23). 

None of these trends are new, or due to special circumstances unique to 2020 (read: COVID-19). The same trends were visible in the 2019 report as well. One notable difference is the 2020 emergence of labor disputes as an expected source of litigation. A change likely due to the pandemic: in 2019 the most pressing concern was cybersecurity. 

In-house legal departments are asked to handle more cases with more data, in less time. Lest we forget, the 2015 FRCP amendments shortened the time to initial disclosure in a civil procedure from 120 to 90 days. Human review speeds reached their upper bound nearly a decade ago. These factors combined have caused significant growth in legal departments in headcounts, budgets, and technology adoption. 

External counsel remains a power player where litigation is concerned. It should be noted that corporate expenditure on law firms appears to be dropping. In the 2019 State of the Industry Report, CLOC’s respondents stated ‘only’ 43% of their legal expenditure went towards external counsel. A third of respondents had moved legal work in-house. Even NFR, a law firm, signals a change: between 2019 and 2020, the legal budget allocation for external counsel dropped from 73% to 66%. 


What is clear from all these data points is legal departments are trending towards technology. The twin pressures of time and volume force in-house legal teams to find ways to work smarter. 

The Advantages of eDiscovery 

The main advantages of eDiscovery are consistency and cost. Consistency is a lawyer’s best friend for a myriad of reasons. eDiscovery as an operation has a lot of moving parts. By having a consistent way of handling cases, mistakes caused by stress can be kept to a minimum. Perhaps more importantly, machines are simply much better at doing the same thing over and over again than human beings are. As the saying goes, to err is human. However, errors in evidence collection or preservation can lead to spoliation

Cost reductions through eDiscovery depend on where the technology is implemented. If the external counsel is using it, alternative fee arrangements may be negotiated, since they will be able to work faster. If the in-house team is taking advantage of the technology, they may be able to find and cull data more effectively. Better culling at the source means less data is sent to the external counsel. It follows that the review process will take less time as a result. The improved consistency also yields results. A more optimized operation will run fast and more smoothly, and be more cost-effective as a result. 

eDiscovery 101 - blog


Download the whitepaper to learn how eDiscovery helps you easily sift through huge volumes of data and find all relevant documents for a case in a fast and cost-effective manner.

The challenges of eDiscovery 

There is no way around the fact that eDiscovery is a complicated operation. No matter how user-friendly, performing eDiscovery means Legal has to work with IT. This cooperation is not always frictionless. At the risk of stating the obvious, lawyers are typically not IT experts. To make proper use of an eDiscovery tool, the Legal Department will have to develop the skills to use it. 

Securing buy-in from both IT and the business departments is important in general. Setting up an eDiscovery practice requires cooperation between these departments for data management. Legal fact finding works best in a well-kept environment. This often necessitates some form of information governance to be in place. If such a system already exists, new policies likely have to be added. 


The main policy to add is data preservation. To avoid spoliation of evidence, a clear data preservation policy is needed. Such a policy tells employees when and how to ensure data they may hold must be preserved. Corporate data is often ‘dirty’: either corrupted, embedded, compressed, or otherwise unsearchable. In short, fact finding works best in a well-kept environment. Getting rid of digital debris is a painstaking operation, but worth it in the end. 

Technology-Assisted Review and eDiscovery

Technology-Assisted Review (TAR) is the method AI-powered eDiscovery tools use to perform document review of a dataset. TAR combines text mining and machine learning to find documents responsive to searches. Simply put, it is software that is able to read information and learn what is and isn’t relevant to the issues in a case. 


A TAR process starts with a training set of documents. A training set can be a random sample or a preselected set of examples. The best results are typically achieved with a preselected set. To put it into perspective, let’s have a look at an example using traditional (or Boolean) search only, and TAR. 

Traditional (Boolean) Search Only 

Boolean search works with keywords and operators. Operators are used to string together or specify search queries. Operators include “AND”, “OR”, and “NOT”. Using these operators, a query can be put together for fact finding. 


Using operators is a feast-or-famine proposition. Though useful, the operators involved are blunt tools. “AND” and “NOT” will reliably narrow your results, while “OR” expands them. There’s not much room for fine-tuning. This means that Boolean searches invariably end with a fairly large set of results. These results will then have to be reviewed manually for relevance. The end result is a time-consuming expedition, mostly due to the search method. 

TAR with random or preselected sample 

Regardless of the sample quality, TAR works the same way. The training set is fed into the system, after which the user ‘teaches’ the software what is and isn’t relevant to the issues in the case. ‘Teaching’ is a small manual review process, where the user goes through the training set flagging documents as responsive or not. From this, the software learns what responsive documents ”look” like. 

In a random sample, training will take a bit longer as fewer responsive documents are likely to be present. For a preselected sample, a user can simply take the results from a Boolean search operation and use it as a training set. Such training sets contain more responsive documents, speeding up the machine learning process. Once taught what is and isn’t responsive, the rest of the dataset can be fed into the system, and the TAR begins. 


When completed, the TAR process produces a culled dataset of documents the software deems responsive. The AI will do this much faster and with more consistency than any human ever could. That said, the results of the technology-assisted review should still be reviewed by a legal professional. For example, AI is currently unable to recognize what makes a document privileged. Such documents can (and should) be excluded from the set that is to be produced. For a more in-depth look at TAR, please consult our whitepaper on the subject. 

Final thoughts 

The digital transformation of the legal profession has been long overdue. With the amounts of data growing and the window of time shrinking, fact finding is a race against the clock. Legal technology has continued to mature. Corporate legal departments have started to adopt tools to assist them in their work. 

When it comes to fact finding in large datasets, humans run into issues. Manual review is a time-consuming and repetitive process with little to no room for error. Asking people to keep up manually with growing data volumes is unrealistic. Throwing manpower (and resources) at the problem won’t solve it in a cost-effective way. AI-powered systems allow legal teams to save costs by limiting the need for manual review. 

Using AI-powered tools also allows for much deeper analytics of the dataset. As the review takes place, the software also collects information about the evidence. It helps uncover social links and identify common topics in responsive documents. Such insights can open new avenues for fact finding. The software can also highlight significant connections in the dataset. In all, it provides a clear picture of what is in the evidence you will be producing. If you want to know more about the possibilities of modern eDiscovery tools, don’t hesitate to reach out