Emerging Methods to Reduce the Cost of E-Discovery in Patent Litigation

As practitioners are well aware, discovery of electronically stored documents and information is a major driver of litigation costs, especially in patent cases. Discovery costs in patent suits are large in both absolute terms and relative to the overall cost of the litigation. The 2013 Economic Survey by the American Intellectual Property Law Association found that a party’s median costs in high-stakes patent infringement litigation (more than $25 million at stake) stood at $3 million at the end of discovery. ¹AIPLA Report of the Economic Survey 2013. For patent infringement suits with $1 million to $25 million at stake, median costs at the end of discovery were $1.4 million. Both figures represented more than half of the total cost of the litigation.

Much of these outsized discovery costs can be attributed to the time-intensive process of reviewing electronic documents. While not limited to patent litigation, a 2010 study conducted by the Federal Judicial Center found that discovery costs were 48 percent higher for plaintiffs requesting and producing electronically stored information than in cases without e-discovery. ²Emery G. Lee and Thomas E. Willging, Defining the Problem of Cost in Federal Civil Litigation, 60 Duke L. Rev. 765, 785 (2010). For requesting and producing defendants, e-discovery costs were 17 percent higher. Yet the costly production and review of large volumes of documents has not been proven to aid in the discovery of useful evidence. Rather, the data suggest the opposite. In 2008, for example, parties produced an average of 4,980,441 pages of documents in major cases that went to trial, but marked only 4,772 pages as actual exhibits. ³Litigation Cost Survey of Major Companies, Statement Submitted by Lawyers for Civil Justice, Civil Justice Reform Group, and U.S. Chamber Institute for Legal Reform for Presentation to Committee on Rules of Practice and Procedure, Judicial Conference of the United States at 3 (2010), available at http://www.uscourts.gov/uscourts/RulesAndPolicies/rules/Duke%20Materials/Library/Litigation%20Cost%20Survey%20of%20Major%20Companies.pdf .

The problem of costly, time-consuming and inefficient e-discovery has led to calls for discovery reform. ⁴See, e.g., Am. Coll. of Trial Lawyers & Inst. for the Advancement Of The Am. Legal Sys., Final Report on the Joint Project of the American College of Trial Lawyers Task Force on Discovery and the Institute for the Advancement of the American Legal System at 9–10 (2009), available at http://www.actl.com/AM/Template.cfm?Section=Home&template=/CM/ContentDisplay.cfm&ContentID=4008 . Although the Federal Rules were amended in 2006 to provide specific guidelines for e-discovery in hopes of reducing complexity and cost, it is apparent that the changes, like many previous attempts to rein in discovery, have been ineffective.

Advisory Committee Model Order

The Federal Circuit E-Discovery Committee acknowledged the growing burden and cost of e-discovery, particularly in patent cases, in a proposed Model Order on e-discovery in 2013: “Excessive e-discovery, including disproportionate, overbroad email production requests, carry staggering time and production costs that have a debilitating effect on litigation.” ⁵Federal Circuit E-Discovery Committee, An E-Discovery Model Order at 2 (2013).

The Committee identified five main cost areas for e-discovery.

• First, collecting documents while preserving the original document date often requires the use of specialized software or a third-party vendor, resulting in service or licensing fees.

• Second, processing documents in preparation for review also requires the use of licensed tools and the commitment of management time, and may need to be repeated to narrow or broaden the scope of collection based on initial results.

• Third, the review itself is usually the most expensive step, with costs rising proportionally to the number of documents requiring human review.

• Fourth, the Committee pointed to issues in preparing documents for production, such as dealing with data and image conversion from documents in nonstandard file formats.

• Fifth, parties incur costs after production for importing and indexing documents into production review tools.

Moreover, the Committee noted that broad e-discovery is likely to produce volumes of documents, such as emails, that are not relevant to the central issues in patent litigation: “what the patent states, how the accused products work, what the prior art discloses, and the proper calculation of damages.” ⁶Id. Accordingly, the Model Order sought to streamline production and reduce the burden of e-discovery through measures such as excluding email and other forms of electronic correspondence from general e-discovery production requests, requiring parties to identify specific issues for email production requests, and limiting email production requests to five custodians per party and five search terms per custodian.

However, the Federal Circuit ultimately did not adopt the proposed Model Order. Likewise, many of the same concepts were also present in legislation taken up by Congress in late 2013 and early 2014 aimed at reducing the burden of lawsuits brought by so-called “patent trolls.” However, at present, none of the bills addressing these issues appears to have a viable path to becoming law.

Reduce E-Discovery Costs in Document Review Phase

With effective reform of the discovery rules unlikely in the near term and clients increasingly scrutinizing major cost drivers, it has become imperative for law firms to adopt strategies to increase the efficiency of e-discovery. One way to achieve these goals is to build at least some of the concepts from the Federal Circuit’s Model Order into case management and discovery plans early on in patent cases. For example, limits on the number of custodians and keywords per custodian will undoubtedly reduce the scope and burden of electronic documents to be collected, reviewed and produced as well as potentially cutting down on the costs associated with negotiating or litigating over an acceptable list of keywords.

Further, a sequenced approach where all “core” discovery, including basic documents about the patents at issue and the accused products, is completed before any electronic discovery takes place is likely to save costs by reducing the scope of what needs to be searched for via e-discovery workflows. Many courts (and especially those with high-volume patent dockets) are becoming comfortable with sequencing discovery issue by issue as a case progresses or bifurcating certain issues such as damages altogether. Indeed, at least one judge may consider bifurcation of damages to be a presumed default in patent cases. ⁷See Robert Bosch LLC v. Pylon Mfg. Corp., No. 1:08-cv-00542 (D. Del. Aug. 26, 2009), available at http://www.bloomberglaw.com/public/document/Robert_Bosch_LLC_v_Pylon_Manufacturing_Corp_Docket_No_108cv00542_.

Each of these concepts can present an opportunity for significant savings during the discovery process. Unfortunately, they all depend on some level of cooperation from opposing counsel or must be adopted by a court, which is far from a guarantee in many cases.

Accordingly, the only surefire way to reduce e-discovery costs is to reduce the expense associated with the only part of the process that attorneys have significant unilateral control over: the document review phase. The traditional model of linear document review, in which attorneys view all documents in chronological order, has become untenable for e-discovery in large cases. The volume of e-discovery likely will grow as companies’ data storage becomes increasingly digital and the field of discoverable documents expands correspondingly. Much of this data is unstructured, in the form of text, dates, numbers, graphics and other information not organized into an easily processed order, as in a database.

Thus, law firms and third-party consultants have implemented various methods of narrowing the document pool and streamlining the review process. Traditional keyword searching of documents has become more sophisticated with the inclusion of metadata and misspellings, but is still relatively inflexible and can lead to protracted disputes over selection of keywords. Computer-based predictive coding, in which a program uses an algorithm to compare documents marked as relevant to unreviewed documents, can reduce review time and expense but is dependent on the accuracy and quality of the initial relevance coding. It may yield good results if based on a robust “seed” group of highly relevant documents; however, if the “seed” group is weak due to human coding error or uncertainty as to the most relevant concepts and language, the results will be similarly compromised.

This presents a significant problem in many cases where, at the beginning of the review process, the various ways of articulating key issues are undetermined or broad. Moreover, predictive coding algorithms offer little transparency or ability to be adjusted during the review if preliminary results are unsatisfactory. Although progressive iterations based on a larger pool of identified relevant documents may refine the pool of unreviewed documents identified by the algorithms, the algorithms themselves remain static.

Because studies have shown that there is a limit to the speed with which documents can be reviewed while maintaining accuracy, the only truly effective way to reduce the cost and burden of e-discovery is to review fewer documents.

Language-Based Analytics

Language-based analytics (LBA), an emerging approach to e-discovery, promises to greatly reduce the initial pool of documents for review by eliminating those that cannot possibly be relevant in one preliminary step. An LBA system first extracts all terms used in the entire document pool, aggregates them and sorts them according to frequency of use. This provides a valuable overview of the actual language used in the documents.

Together with a detailed discussion of what language would make a document relevant, the term list is then reviewed with the client or others knowledgeable about the facts and issues in the case. This step, unlike systems relying on artificial intelligence, provides critical human insight in understanding the meaning of language in the context of the case. For example, say a party has conference rooms named after former U.S. presidents. Without identification and human review of the terms “Jefferson” and “Lincoln” in the document pool, their contextual meaning would be lost and relevant communications referring to meetings may be overlooked. Once the key issues are articulated (the fact that a meeting took place, for example), the term list is a useful tool to identify additional words that may have been used to signify a meeting (“Jefferson” and “Lincoln” in the above example).

This initial language-aggregation step gives reviewers a foundation on which to build their articulations of the key issues in the case rather than making guesses or assumptions as to what words and phrases indicate relevant documents. It allows reviewers to identify, from a concrete list of choices, frequently used terms that are clearly relevant. By determining what documents need to be about to be relevant, and what sort of language those documents need to contain, a category of documents emerges which cannot possibly be relevant because they neither are about the right issues nor do they contain the right language. This category of documents which couldn’t possibly be relevant is then tested through sampling to a high degree of confidence that those documents are, in fact, not relevant. If the sampling test fails, any new issues or important language found during the sampling is considered.

Although LBA is only beginning to be adopted, users report that this initial filtering can cut the review pool by 90 percent or more in patent infringement actions. In addition, by reducing the number of documents to a fraction of the initial pool, LBA greatly reduces the drain on time and resources, and specifically the costs, associated with document review. The review can be completed by a relatively small number of associates or contract attorneys in a matter of days.

The LBA review process itself improves upon predictive coding algorithms by allowing reviewers essentially to design and refine the criteria for selecting relevant documents as the review progresses. Working with the filtered pool of documents containing potentially relevant language, reviewers highlight specific words, phrases, figures or other data that relate to one or more of the key issues in the case. The system then generates Boolean searches based on this language, which it applies to all other documents in the pool. Documents containing similar language are automatically marked as relevant and removed from the review pool, but remain available for subsequent coding due to additional highlights, further streamlining the review process.

Critically, the LBA model compares the actual language itself rather than first relying on comparing the documents as a whole, as in predictive coding. This provides greater precision in matching and allows the system to deem documents with matching language relevant automatically rather than merely “suggesting” them to the reviewer. Moreover, because the Boolean searches are transparent and generated directly from reviewers’ selection of language, they can be adjusted as necessary, unlike predictive-coding algorithms. If a reviewer’s selection of a particular term or phrase causes an overly broad range of documents to be deemed relevant, for example, that selection can be easily identified and modified to eliminate false positives.

Finally, the LBA process can be engineered to provide virtually any level of statistical certainty the reviewer or the client considers necessary to ensure that relevant language is not being overlooked, and that the review is defensible if challenged. This verification sampling can be performed as often as the reviewer or the client wishes, on as broad a sample as desired, to achieve a given confidence level. For example, the LBA review process can be designed to achieve 95 percent certainty that all relevant documents have been identified. For incoming document reviews, of course, the same process would provide 95 percent certainty that all relevant documents have been selected.

Conclusion

As data storage becomes increasingly digital and the volume of electronically discoverable documents grows, the problems of runaway costs and escalating strain on time and resources for e-discovery will likely grow as well. At the same time, law firms are under increasing pressure from clients to provide more efficient and cost-effective service. E-discovery, as a major driver of patent litigation costs, also represents a significant opportunity for streamlining and cost-cutting. Practitioners should attempt to build in ways to narrow the scope of e-discovery early on in patent cases either through discovery planning or motion practice as necessary.

Absent these tactics, use of language-based analytics in the review process can reduce the initial document review pool to a fraction of its original size by identifying and filtering out the clearly irrelevant documents that are especially prevalent in patent litigation. The review can then be completed relatively quickly, and with a high degree of statistical accuracy, by a comparatively small group of reviewers.

Dramatically increasing the efficiency of e-discovery through language-based analytics has the potential to deliver major cost savings for clients while providing a significant competitive advantage for firms.

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.