What Is Data Volume And How To Face Discovery Challenges In Healthcare

Modern society is awash in data, in a wide range of formats and types. While every industry faces challenges in how to efficiently store and process data, the healthcare industry faces particular challenges with the ever-expanding volume of data that it generates, retains, and manages. 

Healthcare organizations generate a vast amount of data. Consider the data generated by one patient at one facility on one day on one visit alone. There are traditional data points like intake forms, doctor’s notes, records, and observations as well as electronic data points generated by portable ultrasound machines, patient monitors, pacemakers, and other wearable devices. Then multiply this volume of data for each patient, each visit, each treatment, each day, month, and year. It’s staggering. 

As technology continues to evolve and as healthcare organizations continue to progress in their digital transformation, the rate at which they generate data will only accelerate. 

Contents: 

What is data volume?
What is healthcare data?
What challenges do data volumes present to healthcare providers?
What are the challenges for healthcare data during the eDiscovery process?
Best practices for proactively managing eDiscovery in healthcare 

What is data volume? 

Data volume refers to the amount of data that is stored or used by networks, organizations, infrastructure, processes, tools, or individuals. Data volume measurements concern both the ability to efficiently store petabytes of data and the ability to process petabytes of data stored natively and in object storage. Note that we’re talking here about petabytes rather than the familiar measure of gigabytes. Each petabyte is the equivalent of 1,000,000 gigabytes. 

When it comes to global data volumes, though, petabytes are insufficient to capture the growth of data. We’re now nearly a decade into the Zettabyte Era, with Statista estimating that the amount of data and information created, captured, copied, and consumed worldwide in 2020 was 64.2 zettabytes. Just how big is a zettabyte? It’s the equivalent of a trillion gigabytes or 1,000,000,000,000,000,000,000 (1021) bytes. 

healthcare data volume - statista

(Source: Statista) Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025

Some of this huge quantity of data is structured, meaning that it is highly organized. Structured data includes easily defined terms like names, addresses, and credit card numbers. Most of the global data volume, however, is unstructured data, which is generally qualitative information. Unstructured data includes descriptive text, data from Internet of Things devices, and content from social media. Because it doesn’t fit into a predefined data model, unstructured data is generally harder to process and analyze.

Turning back to the healthcare industry, Statista projected that the global healthcare sector generated more than 2.3 of the 64.2 total zettabytes generated worldwide in 2020. The healthcare sector generates more than 19 terabytes of clinical data alone each year, and that sum doesn’t even begin to consider other forms of healthcare data. 

Let’s take a closer look at what healthcare data includes. 

What is healthcare data? 

Healthcare data takes many forms. One of the most prominent is electronic health records (EHRs). EHRs are patient charts that may include various types of both structured and unstructured data, including medical histories, treatment plans, quantifiable lab and test results, and radiology images. 

But one of the difficulties with healthcare data is that it includes many other types and sources of data, including paper records and data in legacy systems. There is structured data that relates to health systems’ financial transactions as well as unstructured data such as emails, contracts, call center records, and much more. There is confidential data about employees and benefits as well as patient and employee survey data, disease registries, data about clinical trials and laboratory research, and claims for insurance, Medicare, and Medicaid. 

As technology continues to evolve, additional forms of data will continue to emerge. For example, wearable devices are increasingly generating new health-related data. In addition, as virtual and remote care rise in prominence, these digital services will generate new sources of data that will be stored in new places. 

What challenges do data volumes present to healthcare providers? 

The sheer volume of data creates a quagmire for the healthcare industry, presenting a host of legal, regulatory, and compliance challenges. To address these challenges, healthcare organizations must determine what data and what types of data they’re generating and storing, how they’re storing and classifying that data, and whether or when they should purge outdated or unnecessary information. Data integrity is paramount, and it begins with solid data governance practices and data management techniques. 

As noted previously, data in the healthcare industry may be either structured or unstructured, which influences how it can be stored and accessed. Structured data such as barcodes or patient identification numbers lends itself nicely to easy manipulation and querying through machine learning. However, the storage options available for structured data are limited and rigid and maintenance can become quite expensive over time. Unstructured data such as email, social media posts, presentations, and images, on the other hand, can be collected quickly but requires specialist expertise to access and analyze. 

Given the risk and regulatory burdens in the industry, healthcare organizations often choose to retain more data out of an abundance of caution. However, this drives up storage costs on the front end as well as litigation and discovery costs on the back end. In fact, a 2010 study suggested that 1,000 pages were preserved for every page entered as an exhibit, and that proportion has no doubt increased since then. 

Furthermore, health systems must consider how to future-proof all of their data to ensure their continued ability to access it. EHR systems that hold patient data and information can sometimes be incompatible across various divisions of an organization or can become obsolete, making it more difficult and expensive to access and analyze that data. 

What are the challenges for healthcare data during the eDiscovery process? 

To keep pace with growing data volumes and prepare for eDiscovery, healthcare organizations must conduct a thorough audit of their data as well as develop and adhere to strict data governance policies. They must also choose eDiscovery technology that is capable of ingesting and analyzing high volumes of data, classifying it accurately, and helping lawyers gather the crucial insights that can inform their litigation strategy. And they must break down data silos between their systems, enabling eDiscovery providers to collect and preserve data anywhere within the organization. 

Linear, manual data review for eDiscovery is incredibly costly and time-consuming — both luxuries that healthcare organizations can’t afford, especially when facing pressing legal or regulatory matters. Instead, healthcare organizations need to leverage technology to rapidly and efficiently winnow their data sets down to the most relevant data for eyes-on review. For matters that involve the same data for similar claims over time, organizations should build a repository of documents they’ve already produced in discovery so they can avoid the need to repeatedly re-tag it for responsiveness and privilege in future matters. 

Remember that many healthcare organizations are still reliant on legacy systems and even paper records. These organizations must be sure to conduct a comprehensive sweep of data sources to capture all actively used systems and files, both old and new. These older sources may include legacy technology like fax machines and mainframes, which makes data collection cumbersome. Additionally, organizations cannot overlook any health-related data stored on their clinicians’ mobile devices, including tablets and laptops. 

Data retention is a complex subject for healthcare organizations. Federal and state laws may require organizations to retain medical records for varying numbers of years, which can present cross-border difficulties for large organizations that operate in multiple locations. Sometimes data must be handled locally instead of remotely, and data retention periods must be continuously monitored. 

Another key challenge for healthcare providers is the need to maintain security and privacy. The Health Insurance Portability and Accountability Act (HIPAA) requires that healthcare providers take steps to safeguard protected health information (PHI). International organizations must also comply with the General Data Protection Regulation (GDPR). Because healthcare organizations handle extremely private information, they must take decisive action to limit the risk of exposure and minimize data processing activities. 

Best practices for proactively managing eDiscovery in healthcare 

To reduce the risks and costs associated with discovery, healthcare organizations should consider following these five best practices. 

  1. Conduct a data audit. To ensure that organizations do not retain excessive and unnecessary data, the legal, compliance, and IT departments must come together to conduct a data audit that explores every aspect of data management and retention across the organization. 
  2. Declutter the data. Reduce eDiscovery burdens by getting rid of redundant, obsolete, and trivial (ROT) data. “ROT” is an apt name for this data, as it is junk data that takes up space and costs money to retain, even though it has no value to the organization and there is no legal or compliance obligation requiring its retention. Organizations should avoid the accumulation of ROT data by sticking to a retention schedule that preserves data for the appropriate length of time according to the jurisdiction and data type, and ensure that they routinely conduct any necessary data purges. 
  3. Pave the way for efficient retrieval. By proactively planning out how data will be categorized and stored, organizations can create plans that minimize eDiscovery expenses associated with the eventual retrieval and collection of data. 
  4. Prepare for eDiscovery by adopting robust software solutions. Technology can make data both searchable and HIPAA-compliant. As part of a regular assessment of the organization’s readiness to comply with eDiscovery requests, the legal, compliance, and IT departments need to ensure that its techniques for logging, archiving, preserving, accessing, and reviewing data are up-to-date as part of its data governance program. 
  5. Invest in eDiscovery technology that expedites review and results. Look for an all-in-one eDiscovery platform that’s able to collect data from multiple locations, apply analytics and artificial intelligence tools to reduce data sets, and identify and redact sensitive information automatically. 

If you’re facing eDiscovery challenges in healthcare, we can help. Our eDiscovery platform with its advanced analytics and AI-assisted review can expedite your search for critical responsive data as well as confidential information that must be redacted and help you produce evidence quickly and securely.