How To Review And Analyze Electronically Stored Information (ESI)

*This article is a chapter from our whitepaper A Guide To End-to-end Digital Investigations

After all that work to map, collect, and process the data involved in the investigation, it’s time to start actually investigating. Reviewing and analyzing data are a package deal here: reviewing data simply means evaluating the data for relevance while analyzing evaluates that relevant data for content and context. 

For the purposes of this article, the focus will be on the review phase. Although analysis is an important part of the process as a whole, we’ll not discuss it in-depth here: it is exceedingly difficult to standardize weighing context and content in a dataset. That process is defined by the content and context of the ESI, meaning there’s no one-size-fits-all approach. 

Here's what we will cover: 

Preparation and review strategy
Reviewing the dataset
Analyzing the dataset
Producing the results
Delivering the investigation 

Let's dive in!

Preparation and review strategy

Up to this point, most of the process as a whole has been about finding data and moving data into the custody of the investigation. Now that the dataset has arrived, reviewers need to prepare for their task, which is to establish the relevance of the data in their set. Regardless of the size of the review team, two things need to be established prior to starting the review: the review strategy and the review environment. 

Establishing the review strategy means setting up the protocols that define how the review will be conducted, setting up a timetable, and establishing terminology to use for tags, codes, and annotations. If the dataset contains some amount of foreign-language materials, the strategy should define if this should be left as-is, machine-translated, or translated by humans. If tools permit, the usage of Technology-Assisted Review (TAR) should be noted here as well. Finally, a protocol for handling sensitive, confidential, or privileged data should be put in place. 

Once the review strategy is finished, the review environment needs to be prepared. Reviewers need to receive user access rights to the tools they need to perform their duties and be given instructions and training. 

Reviewing the dataset 

With the dataset, strategies, and tools in place, the review can begin. While the data is weighed for relevance, detailed logs should be kept so that they can be included later during the presentation of the data as a technical report. 

The low-tech way of reviewing is, as the name implies, low-tech. It mostly consists of reviewers sifting through the information manually and weighing documents for relevance. Predictably, this process is time-consuming, labor-intensive, and leaves significant room for error. 

At the end of the day, reviewers are human and that means the droning, repetitive nature of reviewing will eventually get to them, which invariably leads to mistakes, inconsistency, or concentration lapses. Also, since humans are only equipped with a single pair of hands and eyes, they’re fairly limited in terms of how much data can go through their hands for them to see. This means low-tech manual review tends to take a lot of time. 

The high-tech way of reviewing means using advanced investigative tools to help mitigate the limitations of human reviewers. Modern eDiscovery solutions have advanced tools to quickly and automatically cull irrelevant data from a set using Technology Assisted Review (TAR), a process through which a reviewer ‘trains’ the solution, powered by Artificial Intelligence (AI) to tell the difference between relevant and irrelevant data. Once the AI understands the difference, it can classify documents based on input from reviewers, in order to expedite the organization and prioritization of the dataset. 

Technology-Assisted Review can dramatically cut down the time (and cost) of reviewing, as reviewers now only need to review a dataset pre-selected for relevance. Of course, the input process for the AI’s training set will be documented in order to preserve defensibility. It’s important to note that TAR doesn’t mean human reviewers are not involved at all; verification remains important – the A in TAR stands for Assisted, not Autonomous. 

Regardless of which method of review is used, the end of the review phase yields a culled dataset composed of only relevant material. 

Analyzing the dataset 

Whether or not review is the final step depends on the context of the investigation. If it concerns an external request, the review dataset moves directly to the presentation phase. For internal investigations, the review dataset will need to be analyzed as well. If the information request originates externally, the analysis will be performed by the requestor. 

No matter who does the analysis, this part of the process may create a feedback loop: if the analysis shows that the dataset is missing information, the review process starts again to provide it. 

If the missing information is not present in the review dataset, this information can be sought outside of it. The combined model of Digital Forensics and eDiscovery allows for this through the analysis feedback loop (see Fig. 5). This is done by making use of the data recovery abilities of Digital Forensics tools if the data is lost, or by entering the information into the eDiscovery solution through the conventional means of data collection. 

The goal is to find key patterns, topics, people, and conversations. While the review answers the question: ‘Where is the relevant information?’, the analysis answers: What is the relevant information?’. 

We won’t delve too deep into analysis here, but suffice to say that modern end-to-end eDiscovery solutions offer a wide range of visualization options and analytical tools to identify and show the connections and content within a dataset. 

Producing the results 

Once the review dataset is believed by investigators to be complete, the results of the investigation will need to be prepared for either internal or external analysis. The most important decision at this stage is determining the production format. 

A few options are available 

Native format: files are produced as they were originally – authentic, but difficult to redact; 

Image formats: files are reproduced into an image format, such as PDF or TIFF; - this is the most common output format used by eDiscovery solutions. TIFF allows for the redaction of information, the embossment of Bates numbers as well as other information right on the document. Image files can also be opened on any system without the need for access to the original application used to create it. 

Metadata or Load Files: files of this type are typically produced to provide the metadata of documents in the production dataset and often the tagging work product performed within the solution. These files can take a variety of industry-standard formats. 

Extracted Text: this is the full text of the documents separated from its original format. This can be printed, too. 

Delivering the investigation 

Once produced, the dataset should be delivered. If that delivery is external or concerns highly sensitive information, the security of the dataset while in transit must be the primary concern. These concerns can be taken away in a variety of ways: by making use of secure file sharing services or physical data carriers to perform the transfer. Furthermore, encrypting the files prior to the transfer can bolster security. Perhaps the best way is to sidestep the issue, allowing the presentation of the results to be done in the eDiscovery platform - that way there’s no transit at all. 

In addition to the results of the investigation, a technical report of the processes may be asked for. Depending on the type of investigation, the technical report may include the following: 

  • The data map used for identification of the ESI. This data map can also provide event tracking for the process from data identification to the upload to the eDiscovery solution.
  • A report of data that could not be collected (and why);
  • An event log of actions performed during processing;
  • A log of the operations applied to cull the dataset;
  • A copy of the review strategy document;
  • A log of the data produced and formats used;
  • Hashes of the produced data. 

Once the production data and technical report have been handed over, the Digital Investigation is completed. The data within the eDiscovery solution can be archived according to the case retention period set in the retention policies. 

To learn more about how to conduct a complete end-to-end digital investigation, download the whitepaper here