Skip to content

What is predictive coding and can it help me?

Tech Support
|Written By Dera J. Nevin

In large civil litigation and regulatory cases, the discovery process is becoming increasingly automated, scientific, and objective. This is evident by the increasing use of “predictive coding.”

Predictive coding are the new e-discovery buzzwords. Articles about the benefits of predictive coding have appeared in Forbes magazine and The New York Times. In mid-2011, one company announced a patent for the technology, sparking a war of words in the e-discovery press.

Let’s start with what predictive coding is not. It is not the “eyes on every document” approach of traditional linear review, where a lawyer starts with the first document and looks at every collected document until every document is reviewed. That approach works well when there is a small amount of documents or in circumstances that require human eyes on every page. However, that approach becomes unwieldy and expensive when hundreds of thousands or millions of pages require review.

Predictive coding remains poorly understood because it is not just a technology but also a project management technique. Predictive coding is a series of computer search and sampling technologies, coupled with a new approach to searching for and reviewing potentially responsive documents. Properly combining all of these elements permits expedited, cost-effective, and highly accurate document review. Lawyers who use predictive coding need to understand how to combine these elements. It’s not necessarily the technologies that are indefensible — just certain uses of them. Judges need to learn to recognize when their use has been or will be ineffective.

Predictive coding has been described as lawyer-driven, computer-assisted document review. At its most basic, it is a form of automated document review; but strictly understanding it this way is to misapprehend the role predictive coding technology plays in searching for and retrieving potentially relevant documents. Predictive coding groups and organizes potentially relevant documents in a way that permits human reviewers to maximize their review time and look at potentially related matters together. I prefer to think of predictive coding not as review technologies, but as search retrieval and information organization technologies applied to the discovery review process.

Usually when predictive coding is described, lawyers who are intimately familiar with the case specify relevant criteria within small sets of data that define the crux of the issues. Lawyers generally do this through keywords or key concepts, but this can also be accomplished by reviewing a small set of documents to “train” the computer on the key issues.

Through an iterative search process, computer algorithms retrieve a set of documents based on the criteria input by the lawyers. The reviewing lawyers may determine that some results are not relevant and request that the algorithm pass through another search iteration, or as many iterations as desired. As the computer learns to distinguish what is relevant, each iteration produces a smaller relevant subset, and a larger set of irrelevant documents that can be used to verify the integrity of the results, by confirming the absence of any relevant material through techniques such as sampling. The extent of the end use of the relevant set depends on the risk threshold of the clients and lawyers.

Different predictive coding tools use different algorithms and computing techniques to obtain the smaller set of documents, but the effect of the technology is generally the same: a weighted ranking of documents according to likely relevance as established by the lawyer.

Predictive coding is generally defined by these characteristics:

• it leverages small samples of documents (input criteria) to find other relevant documents;

• it reduces the number of non-relevant documents that lawyers must review, thereby reducing the overall amount of lawyer time spent reviewing documents; and

• unlike straight manual review of documents by lawyers, the results generated by predictive coding algorithms can be validated through statistics.

The use of predictive coding for document review raises at least two legal issues. First, does the use of this technology meet counsel’s obligation to conduct a reasonable and defensible search for responsive documents under applicable discovery rules? Second, can counsel safeguard a client’s solicitor-client privilege when a privileged document is disclosed? To date, although some judges in the United States have spoken positively of predictive coding, its use has not yet been tested through a defensibility motion.

Arguably, the use of technologies, including predictive coding, is encouraged within the Sedona Canada Principles. In particular, Principle 7 states that “[a] party may satisfy its obligation to preserve, collect, review, and produce electronically stored information in good faith by using electronic tools and processes such as data sampling, searching, or by using selection criteria to collect potentially relevant information.”

Perfection is not required in the e-discovery process or in document review. The operative standard is reasonableness, which requires that counsel implement a document review system (regardless of whether predictive coding is used) that relies on reasonable steps to make disclosure of relevant documents and prevent disclosure of privileged ones. Manual review of documents will not always meet this standard; neither will predictive coding.

Some of the very traits that appear to make predictive coding unreasonable actually make it reasonable. For example, it makes document review more efficient by exposing the human reviewer only to those documents that have been algorithmically identified based on the specifications input by the lawyer, but in a more powerful way than just by keyword. Assuming lawyers input criteria correctly, predictive coding makes it more likely that responsive documents will be produced. Second, the iterative nature of predictive coding refines relevant subsets for review in a way that can be validated statistically, both for opposing counsel and the courts. The key to validating predictive coding is to establish metrics that surpass standards that previously prevailed under past document review paradigms. Fortunately, models for these metrics already exist.

As to the second legal issue, it’s true the computer might get the privilege call on a document wrong; however, so too might humans. Arguably, humans are less likely to get it wrong if they are looking at an overall smaller set of documents. Remember too, that predictive coding techniques help the quality check process by identifying inconsistently tagged records, allowing for a final cross-check of privilege records prior to production.

Dera J. Nevin is the senior director, litigation support, and e-discovery counsel at McCarthy Tétrault LLP. A practising lawyer, she also oversees the firm’s e-discovery operations and can be reached at