Lawyers handle tremendous amounts of sensitive information every day: their clients’ personal data, including both personally identifiable information (PII) and protected health information (PHI), intellectual property, trade secrets, financial information, and much more. At the same time, lawyers are often required to provide information to opposing counsel, the courts, regulatory agencies, and, under some circumstances, citizens making requests for personal data or governmental records. The trick is to share everything you’re supposed to and nothing you’re not.
Redaction—obscuring or hiding text—is the means by which legal teams remove sensitive information from otherwise disclosable records. There are two major challenges around redaction: efficiently identifying the pieces of sensitive information that may be hiding within reams of disclosable data and thoroughly redacting that information prior to production.
In this blog post, we’ll start by defining a few key terms: sensitive information, inadvertent disclosure, and redaction. We’ll then discuss the risks involved with redaction and review some best practices for completing redactions quickly and effectively without manually redacting the same information over and over again.
What is sensitive information?
What is inadvertent disclosure of sensitive information?
What is redaction?
What are the risks involved with inadvertent disclosure of sensitive information?
Best practices for redacting sensitive information
1. Don’t rely on forms to locate sensitive information
2. Use technology to identify sensitive information
3. Include a reason code for each redaction
4. Ensure that sensitive information is removed, not just covered
5. Remove sensitive information from text files and metadata
Technology is the key to efficient redaction of sensitive information
Sensitive information is information that should be protected from view because it is private, confidential, privileged, or otherwise secret—which means that whether information is sensitive depends on the audience to whom it will be disclosed. Generally speaking, sensitive information may be:
Sensitive information often occurs within the context of documents that should (or even must) be disclosed. That disclosure may occur in the context of eDiscovery, court filings, or elsewhere in the course of litigation. However, legal teams may also need to compile information for disclosure pursuant to the federal Freedom of Information Act (FOIA), or its corollaries in state law, known interchangeably as sunshine laws, open records laws, or public records laws. Under federal and state open records laws, citizens are entitled to obtain information about how their government operates.
While both litigation and open records laws carry a presumption that responsive information should be disclosed, there are exceptions for sensitive information. What happens when information that should have been protected slips through the cracks and is accidentally disclosed? That’s what we’ll turn to next.
Inadvertent disclosure occurs when sensitive information that should have been withheld for privacy, confidentiality, or other reasons is accidentally included within a disclosure. Generally speaking, inadvertent disclosures are accidental oversights—mistakes that occur during either the identification of sensitive information or the application of a redaction.
We’ll consider the risks of inadvertent disclosure in just a moment, but before we do, let’s define that last term.
Redaction is the process by which sensitive information is fully removed from disclosed records, whether those records are being disclosed in eDiscovery, in a court filing, in response to an open records law request, or otherwise. Whenever a recipient is entitled to receive records that also include information they are not permitted to see, those records should be redacted to protect the sensitive information within them. Redactions generally appear as heavy black boxes over individual words or numbers or, in the case of more extensive redactions, bars concealing lines of text.
Redaction can be a time-consuming, aggravating, and error-prone task. It is the quintessential needle-in-a-haystack data problem: legal teams must parse through pages of non-sensitive information to detect the small pieces of sensitive information—names, dollar values, identifying numbers, and more—that may be hidden within.
But finding sensitive information is only the first challenge involved in redaction: legal teams must also thoroughly sanitize that information so that it cannot be uncovered by any means. You increase your risk of inadvertently disclosing sensitive information when you don’t give sufficient attention to both pieces of the redaction puzzle. Using inefficient manual processes or outdated technologies to locate sensitive information can cause you to miss pieces of information entirely, such that you fail to apply a redaction. Similarly, using improper and ineffective redaction methods can cause you to produce information that you’ve identified and that you thought you redacted, but that can still be seen. Both errors create risks—which we’ll discuss next.
Download the whitepaper to learn how eDiscovery helps you easily sift through huge volumes of data and find all relevant documents for a case in a fast and cost-effective manner.
When sensitive information is either not identified within a disclosure or incompletely redacted so that it can be revealed through various means, that information becomes accessible to parties who should not have received it. These errors can lead to data protection claims, waive the attorney-client privilege, provide the basis for a malpractice lawsuit or professional discipline, undercut arguments in a case, and more. The damages caused by an inadvertent disclosure of sensitive information are, in some cases, limited only by the injured party’s imagination.
You’ve likely heard of some of these blunders, such as when Paul Manafort’s lawyers improperly redacted information in a court filing. Reporters who received redacted copies of the filings were able to discern the “redacted” confidential information by simply copying and pasting the text. Not only are such errors embarrassing, but they’re also potentially damaging to the lawyers’ subsequent arguments.
How was the information in Manafort’s filing so easily revealed? The legal team had covered the sensitive text with black boxes, but they had not actually removed the underlying text, so copying the selection into another document revealed it. The same can happen with text that’s “redacted” by changing the font color to match the background; change to a contrasting font color and the apparently missing text comes right back.
Additionally, FOIA and open records requests, in particular, are usually under tight time pressure. Government agencies have only a limited window in which to respond to these requests, so time is of the essence. That added pressure can make the Sisyphean task of searching through records for sensitive information even more daunting.
So, how do you avoid these risks by getting redaction right? Here are our top five best practices.
In redacting sensitive information, remember that there are two distinct challenges—and you have to get them both right. First, there’s the need to identify sensitive information in the mass of non-sensitive data and flag it for redaction. Second, there must be an effective redaction that thoroughly sanitizes the disclosure and closes any “back doors” to revealing the redacted information. There’s one key that unifies these best practices: they all emphasize the importance of automatic redaction technology.
Some of your disclosed documents may be forms that have been filled out. It’s tempting to try to save time by learning where personal data might be on these forms and effectively ignoring the rest of the document, but it would be a mistake to do so. Likewise, you shouldn’t do a simple search for the phrase “Social Security” when you’re looking for Social Security numbers; these numbers could be misplaced, used in other contexts, or referred to by an abbreviation or shortcut such as “SSN.” Remember that sensitive information could be anywhere on the page—and to find all of it, you have to check all of the text. Fortunately, as the second-best practice points out, you don’t have to check it all with your own eyes.
Technology is the key to streamlining and simplifying redactions while simultaneously improving the accuracy and consistency of results. Technology offers solutions to both of the challenges of redaction, making it easy to pinpoint sensitive information and enabling its complete removal.
One blog author referenced a recent case involving over 27,000 chat messages and more than 2 million individual redactions. That’s an overwhelming volume of messages for any human to sift through and an even more overwhelming number of redactions to apply manually. Powerful redaction software enables users to define patterns for sensitive information, including Social Security numbers, phone numbers, email addresses, bank account numbers, and even proper names. This technique, known as auto-classification, allows users to define rules for text that should be redacted and then take advantage of digital search capabilities to rapidly scan through all of the data in a disclosure.
As much as you might sometimes want to, you can’t just redact everything in a document—that defeats the purpose of the disclosure, and it’s a violation of ethical standards. You need to have a reason for each redaction you apply should a recipient challenge the redaction. It’s therefore a common practice to automatically provide the reason for a redaction, often as white text on the black redaction box. Fortunately, as long as the user defines the reasons for a redaction when establishing a rule for an automated search, the same technology that identifies sensitive information for redaction can automatically code the reason for the redaction. For example, if a user codes a redaction search for data that fits the pattern of a date of birth, they can include “personal data” or “personally identifiable information” as the rationale for that redaction. Then, anytime the software identifies a date of birth, it will both redact the text and code the reason for the redaction.
As demonstrated by the Manafort case, it’s not enough to cover up or “white-out” sensitive information; that information must be permanently and entirely stripped from the document, image, or file in which it exists. Manual means of covering or obscuring text are not only time-consuming, but the results are also inadequate. Again, automatic redaction technology will completely obscure text and “burn” the redaction box so that the box covering the text cannot be removed.
There are two “back doors” through which sensitive information is often inadvertently disclosed despite its complete removal from the original file. First, with images that have been subject to optical character recognition (OCR) to translate text within the image, any sensitive information must be redacted from both the image file and the accompanying text file. Second, files are typically accompanied by metadata files, such as load files and data files, some of which may contain the same information that was redacted from the original file. If these sources of information are not also stripped and sanitized, inadvertent disclosure of sensitive information can still occur.
Redacting sensitive information to avoid inadvertent disclosure can be time-consuming, frustrating, and prone to errors—or it can be fast, easy, and thorough. Technology is what makes the difference, simplifying the process of identifying sensitive information within a disclosure and then ensuring the complete removal of that information from the original file as well as any accompanying text and metadata files.
ZyLAB ONE includes an auto-redaction capability with an outstanding auto-classification tool. With ZyLAB ONE, you can create redaction rules for any type of sensitive information and automatically identify and redact all of those types of data while adding a code for the redaction reason. It’s straightforward to configure rules that identify types of information, from names, emails, and employee identification numbers to banking information, Social Security numbers, and more. With ZyLAB ONE, redactions are simple, reliable, and complete. Contact us to learn more.