Collect to Protect: Processes and Technologies for Electronic Data Collection

Aug. 28, 2012, 4:00 AM UTC

The collection of electronic data has become an important responsibility for parties facing litigation. The 2006 amendments to the Federal Rules of Civil Procedure put a significant emphasis on e-discovery. 1See Jason Fliegel & Robert Entwisle, Electronic Discovery In Large Organizations, 15 Rich. J.L. & Tech. 7, *6 (2010). There has been an exponential growth in recent years in the amount of electronically stored information that can be subject to discovery. 2Jason R. Baron, The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, 8 Sedona Conf. J. 189, 196 (Fall 2007).

Routine discovery requests now often require searches of various storage devices for ESI, including “servers, networked workstations, desktops and laptops, home computers, removable media (such as CDs, DVDs, and USB flash drives), and handheld devices (such as PDAs, cell phones and iPods).” 3Id.

Due largely to the ease with which organizations and individuals now generate and store electronic data, proper collection of this data for the purpose of discovery has become more extensive, more complex, and more costly. 4See John. H. Jessen, An Overview of ESI Storage & Retrieval, 11 Sedona Conf. J. 237 (Fall 2010). Many organizations are not equipped to handle the demands of electronic data collection for discovery purposes, and look to outside vendors to help with electronic document collection.

Parties must be cautious, however, in how they proceed with data collection. They need to take steps to ensure that the collection is carried out in a reasonable and legally defensible manner. Errors may occur when best practices for data collection are not followed. 5See Jason Fliegel & Robert Entwisle, supra n. 1 at *5.

Section 1 of this paper addresses considerations and concerns that parties should keep in mind when faced with a data collection.

It first discusses the decision to outsource data collections to outside vendors. It then examines the role of procedural rules in collection, the benefits of early collection, and the costs and benefits of over-collection versus under-collection. It then suggests plans and strategies for handling a document collection.

Section 2 of this paper addresses various tools and technologies available to help manage and carry out document collections in a forensically sound manner. In doing so, section 2 examines three types of forensics: traditional forensics, network forensics and collections, and mobile forensics.

Section 3 of this paper addresses the importance of being able to defend the document collection process in the event of a challenge from the opposing party.

Section 1: Initial Considerations

Outsourcing.

When facing a data collection, one of the first decisions a party must make is whether to handle the collection internally, or outsource part or all of the collection to an outside vendor. 6Thomas Allman, Ashish Prasad, Anthony Diana & Matthew Rooney, Electronic Discovery Deskbook, 6:3.1, Practicing Law Institute, 2009.

The decision on how to handle a collection will rest on various factors, such as the size of the collection and the financial stakes of the litigation. If a company often faces litigation, it may be useful to invest in establishing an outside vendor to perform collection functions.

Often, parties will use some combination, outsourcing portions of the document collection and handling other portions internally. Additionally, companies that outsource document collection to an outside vendor will generally verify the quality of the outsourced processes in-house. 7Id.

Choosing a Vendor.

When working with outside vendors, parties should be careful to choose a vendor that can provide them with the necessary tools, technologies, and personnel to ensure that the document collection is carried out in a defensible manner, and should work with the vendor in implementing an appropriate strategy to collect and document all collection efforts, in order to ensure that the vendor will be able to meet a party’s expectations.

It is very important to maintain
an accurate chain of custody log …

Parties should select a vendor that can monitor the completion status of document collection projects, and that can monitor any technical issues that may arise.

Parties should also work with the vendor to develop a quality control protocol, in order to ensure there are no issues at the document production stage resulting from errors at the data collection stage.

When faced with large amounts of electronic data, parties should look for computer forensic professionals, who can serve in an advisory role in offering collection methodologies. 8Samuel H. Solomon, Through the Looking Glass: Identifying and Avoiding Risks Under the New Federal Rules of Civil Procedure, SM078 ALI-ABA 303 (2007).

In selecting a computer forensic specialist as a vendor, companies should consider the organization’s “track record regarding spoliation, ability to work within budgets, evidence of storage facility security, means of data transmission and the ability to maintain data collections over a long period of time without having the need to recollect data for the same matter.” 9Id.

A computer forensic professional helps to provide a defensible methodology during data preservation, and also helps to provide a more complete picture to the producing party. 10Id.

Investing in Software.

It may also be useful for a company to invest in software tools that can be installed in the company’s network to collect ESI. This software can be costly, however, so smaller companies facing infrequent or lower stakes litigation may not find it cost-effective to purchase and use their own technology.

Companies who do choose to invest in software to help manage document collections may also still choose to contract with vendors in order to have them provide additional personnel needed to complete the document collection.

Procedural Importance of Document Collection.

A party facing litigation will need to comply with various provisions of the Federal Rules of Civil Procedure. Under Rule 26(g), for example, a party or that party’s attorney must attest that “to the best of the person’s knowledge, information and belief formed after a reasonable inquiry,” a document production is “complete and correct as of the time it is made.” 11Fed. R. Civ. P. 26(g).

Should a party fail to make a diligent and reasonable collection under the Rules, courts may impose sanctions and may instruct juries to draw an adverse inference, in the event that a party fails to produce pertinent documents. 12See Jason Fliegel & Robert Entwisle supra n. 1 at 12-23

It is important, therefore, for a party to be able show that the party made a reasonable effort to collect any and all relevant and responsive documents.

Getting It ‘Just Right’.

In determining what and how much to collect, producing parties will have to weigh the costs and benefits of risking over-collection, versus under-collection, of information. Over-collection occurs when a party collects more information than is required for it to ultimately produce relevant and responsive documents, and under-collection occurs when a party collects less information than is required to achieve this goal.

The problem with over-collection is that due to the large amounts of data that are now coming into the discovery process, the costs associated with collection can be astronomical, potentially even exceeding the amount in dispute. It can result in the waste of significant amounts of money and effort.

As an example, a large company with no pending litigation may still spend millions of dollars on preservation in anticipation of potential future litigation, when there is still no adverse party with which to negotiate on preservation procedures. 13Mini-Conference on Preservation and Sanctions, Dallas Texas, Sept. 9, 2011, Advisory Committee on Civil Rules, 130 (Nov. 7-8 2011) available at http://www.uscourts.gov/uscourts/RulesAndPolicies/rules/Agenda%20Books/Civil/CV2011-11.pdf (last visited on Aug. 17, 2012).

In another case, 60 custodians were identified at the onset, but as the case evolved, more custodians were identified until eventually there was the need to preserve data from 250 custodians, despite of the fact that most of the preserved documents had not been reviewed by anyone. Preserving fewer documents, however, raised unacceptable risks for the company, out of fear that preserving less data would result in an adverse judgment. 14Id.

… each source file should be properly categorized, documented, and assigned.

In yet another situation, a dispute worth less than $4 million required the preservation of data from 57 custodians, which had an associated cost of $3 million. 15Id.

Under Rule 26(b)(2)(B), a producing party does not need to provide electronically stored information from sources that are identified by the party as being not reasonably accessible due to undue burden or cost, 16Fed.R.Civ.P. 26(b)(2)(B). and it is black letter law that a party producing documents does not have an obligation to collect and produce “every shred of paper, every e-mail or electronic document, and every backup tape” in its possession. 17Zubulake v. UBS Warburg LLC, 220 F.R.D. 212, 217 (S.D.N.Y. Oct. 22, 2003).

However, the risks associated with failing to collect relevant documents, such as the risk that a party will be found to have violated its discovery obligations, often lead parties to over-collect information, and parties often “err on the side of an overly inclusive collection plan.” 18Allman, Prasad, Diana & Rooney, supra n. 6 at 6:3.2.

In order to minimize the risk of spoliation charges, document preservation often takes place early in the litigation, often before document requests are even received. 19Ann Marie Gibbs, Sheila Mackay & Doug Stewart, The Data Preservation Data Collection Continuum, Electronic Discovery & Records Management Quarterly, 6 (Winter 2007) available at
http://www.daegis.com/wp-content/uploads/2011/01/ar-the-data-preservation-data-collection-continuum.pdf (last visited Aug. 17, 2012).
Parties frequently begin collecting ESI to preserve near the outset of litigation, or in some cases, even at the first hint of litigation.

Preserve-in-Place or Preserve and Collect?

Typically, upon receiving a complaint, a company will issue a litigation hold notice to employees and other necessary third parties, and will then put a monitoring protocol into place. 20Id. With this type of preservation-in-place, custodians will be alerted to preserve any relevant electronically stored information where it resides, and to ensure that it is not destroyed, altered, or deleted. 21Id.

At this point, IT personnel will often review operations and change any necessary settings or policies, in order to prevent loss of relevant data. 22Id. Some data may be preserved in this fashion, but will never ultimately be collected in a physical data capture. 23Id.

Other data, such as data stored on the hard drives of key custodians, may be simultaneously preserved and collected through forensic imaging. 24Id. These are often parallel processes, occurring simultaneously, and the choice to preserve-in-place versus the choice to preserve and collect at the same time will ultimately depend on the litigation, the potential relevance of the data, the potential importance of the custodian, and the “data type, content and subject matter relevance.” 25Id.

The Meet and Confer.

Opposing parties have an obligation under Rule 26 to meet and confer “as soon as practicable,” and to make a good faith effort to agree to a discovery plan. 26Fed.R.Civ.P. 26(f).

During meet and confer meetings, attorneys will generally exchange information on their clients’ ESI for the purposes of negotiating agreements on the scope of discovery. 27Oliver Fuchsberger, IT Tips for Ediscovery Best Practices, 30 Aug. Wyo. Law. 32, 33 (2007). Parties should take advantage of this opportunity to try to reach an agreement on how data should be collected.

Key Custodians and Systems.

Parties will generally collect all ESI relating to key custodians and systems that have been identified as containing responsive information. Generally, parties will focus on certain custodians or systems where relevant documents are most likely to be located, as a way to make document collection more manageable. The data collected can then later be culled using filters to help remove unresponsive data. 28Jerry Thompson, The Evolution of Litigation Support, 45 SPH Ark. Law 20, 21-22 (Spring 2010).

In determining the key custodians and systems from whom data should be collected, parties often define the scope broadly and err on the side of caution, especially early in the litigation. 29Allman, Prasad, Diana & Rooney, supra n. 6 at 6:3.4.

Using a Good Process for Document Collection.

During the early planning stages of the litigation, the parties’ counsel should identify who will perform the collection, how and when it will be performed, and who will be responsible for tracking document collection efforts. 30Ashish Prasad, Effective Project Management in Discovery, The Practical Litigator 15, 20 (2009).

There are three basic goals underlying the design of an ESI collection plan:

  • “(1) Meeting basic discovery obligations.”
  • “(2) Overcoming claims of under-collection.”
  • “(3) Ensuring the admissibility of the evidence if sought to be used.” 31Allman, Prasad, Diana & Rooney, supra n. 6 at 6:2.

When using a discovery vendor, a party should make sure to collaborate with that vendor in scheduling and implementing the collection. It is important to ensure that there is enough equipment, storage, and personnel in place to collect the data without modifying or altering it in the process. 32Id.

It is especially important to have procedures in place for tracking what documents have been collected and when, to make sure that all of the pertinent data has been collected, as well as to prevent collecting duplicate data during subsequent collections as the litigation continues. 33Id.

Collection Notice.

When it is determined that the collection of data should occur, the legal department will generally issue a collection notice to the list of key persons and data custodians. In the collection notice, key persons are instructed to forward relevant information stored on removable devices such as CDs, DVDs, USB drives, and zip disks.

The collection notice instructs recipients to clearly label all media with their name, the location from which the documents were retrieved, the date the documents were created, and other identifying information.

Chain of Custody Log.

Once the documents are collected, they are assembled, and the legal department records the date on which the data and documents were collected, who they were collected from, and whether they were forwarded to outside counsel or to the discovery vendor. This information should generally be recorded in a chain of custody log.

It is very important to maintain an accurate chain of custody log, identifying all electronic media that is transferred between individuals from collection through delivery of the electronic data to outside counsel or to the electronic discovery vendor.

The chain of custody log should be readily available for review, and should contain the name of the individual who collected or transferred the data, the date of the collection or transfer, a description of the data, the name of the individual to whom the data was sent, and the date the data was sent.

Data Collection Spreadsheet.

Using the list of key persons and data custodians, the IT department will often create a data collection spreadsheet for identification and tracking of any personal and group shared drives from which data must be collected. The IT department can use the data collection spreadsheet to identify the personal and group hard drives to which any key persons have access, and can provide copies of the relevant group and personal drives to the electronic discovery vendor.

In house and outside counsel can decide whether it is necessary to image any hard drives, and should determine which key persons should be asked to search and produce responsive data from laptops or other mobile devices.

Identifying Other Sources.

It is also important for a party to review what relevant data sources there could be in addition to the data maintained by those previously identified on the key persons and data custodians list.

In order to identify this organizational data, outside counsel, the legal department, and the IT department should work together in order to identify the business areas likely to have responsive data, identify the IT personnel who support those relevant business areas, identify the company systems, databases or applications that are believed to be responsive, and facilitate the implementation of a process for collection of this organizational data.

By carefully planning and implementing proper procedures for facilitating the document collection, parties take steps towards avoiding sanctions or other adverse situations that may arise when errors are made with regard to the document collection. Careful documentation of all efforts is key.

Section 2: The Right Tools for the Job

Parties and vendors can use various tools and technologies in order to assist parties in adequately carrying out the necessary steps for document collection. When considering the collection of data and the defensibility of the collection process, it is important to remember that the process will only be successfully completed if the right tools are used at each stage of the process.

The goal of computer forensics is to examine digital media with the goal of identifying, preserving, recovering, analyzing, and presenting facts and opinions about the information found. Forensics can be classified into three distinct categories: traditional forensics, network forensics and collections, and mobile forensics.

While mobile forensics could be classified as a subset of traditional forensics, it merits a separate discussion, due to the increasing number of mobile devices from which data must be collected in recent years.

Traditional Forensics: When Nothing But a ‘Forensically Sound Image’ Will Do.

Traditional computer forensics involves several steps to analyze the contents of devices such as servers, workstations, and laptops. Using the traditional forensics approach to conduct a collection typically results in the capture of the widest possible range of information about the information collected, including system settings, what has been deleted, what devices were connected to the device, and so on.

The steps include, but are not limited to the following:

  • creating a forensically sound image (or copy) of the device;


  • transporting the copy of the device to a review lab to be analyzed;


  • making a copy of the forensic image taken, to review the copy and not the original;


  • creating a specially configured dataset within special forensic software and importing the copied image(s) into the dataset;


  • reviewing the device within the forensic software to find the answers to the questions being posed; and


  • developing a formal report showing all steps taken as well as any findings for the client.

While the steps taken may vary slightly from matter to matter, the overall process is the same across the majority of matters.

The growth of mobile device hard drive sizes and capabilities makes mobile devices even more important considerations in document collection.

The Market Offerings.

There are several forensic software packages available to investigators, and many of these are well known and used by many in the forensic field. Each tool listed below has its pros and cons, and it is worthwhile to note that the experience of the investigator will directly impact the results of using any of the following tools.

In traditional forensic collections, investigators will typically travel to the site where the devices are located to conduct the collection. Individuals should be highly trained, and even certified in the tools that they are using, in order to ensure the proper results.

While this is not an exhaustive list of traditional forensic applications, two of the major industry-accepted technologies are:


These two forensic tools allow the user to do everything related to collecting data in the traditional way, from data imaging to data searching/filter to data analysis. These tools have also proven to be widely acceptable as reasonable and defensible forensic tools, with a broad spectrum of forensic options.

Additionally, there are two other forensic tools that are more specific to the ability to collect emails and instant messaging text and have been widely accepted. These are:


.

Network Forensics: When a Single Point of Collection Is Needed.

The analysis and collection of data in a large corporate network can be burdensome and overwhelming. In these situations, when thousands of systems are involved, and data is spread through a huge corporate network, traditional forensics may not be feasible.

A growing number of technologies are available to assist corporations, in the event that legally defensible collections are required, but if forensically sound images are not. These technologies can use advanced search capabilities, including keywords and concepts, which enable a company to proactively or reactively identify, collect, cull, and produce their data.

Typically, these technologies are deployed within the corporate firewall. Several of these technologies enable both corporate professionals and forensic examiners to analyze pedabytes of information in minutes rather than hours.

While this is not an exhaustive list of traditional network applications, five of the major industry-accepted technologies are:





AccessData and EnCase are the network enterprise versions of those companies’ traditional software and provide similar functionality to their traditional counterparts, with the addition of network-capable collections.

StoredIQ, EMC Kazeon, and Symantec’s Clearwell started in the network analysis area and moved towards network collections. In each of their cases, they were developed originally for the purposes of network storage analysis and/or email storage analysis and their tools evolved to include more forensically sound practices, enabling them to also be used for forensic network collections.

Mobile Forensics: Collecting from Devices That Are Constantly on the Move.

Specialized software is needed to cover all mobile devices, since device models change frequently and vendors that make mobile forensics software and hardware have to update their systems continually to keep up with these changes.

The memory type, custom interface, and proprietary nature of mobile devices require a different forensic process compared to traditional computer forensics. Each device often has to have custom extraction techniques used on it.

Mobile devices are no longer phones—they are actually mini-computers with contacts, photos, documents, calendars, and notes on them. With more and more applications being built for mobile devices, the amount of information that can potentially be stored on a mobile device is greater than ever before, including, but not limited to, personal banking information, family contacts and photos, and password lists.

Additionally, the growth of mobile device hard drive sizes and capabilities makes mobile devices even more important considerations in document collection.

The forensics process for mobile devices broadly matches other branches of digital forensics; however, some new methods and technology is typically used to gather the data needed.

One of the primary considerations for mobile forensic analysts is preventing the device from making a network or cellular connection. Allowing a mobile device to connect could bring in new data and overwrite existing evidence stored on the mobile device.

Additionally, the short battery life and the potential of not having the proper electrical connection can cause the mobile device to switch off during image capture. This can present a challenge, since due to the proprietary nature of mobile phones, it is often not possible to collect data while it is powered down. For this reason, mobile acquisition is often performed live.

There are several ways to image a mobile device, and most rely on the vendor-supported software drivers and methods for connecting to the device. The process can also vary, depending on the product being used, its capabilities, and its supported device list.

As an increasing number of mobile devices use high-level file systems, similar to the file systems of computers, methods and tools can be adopted from hard disk forensics, with minor changes. However, it is important to understand what one is trying to get off of the mobile device prior to planning the acquisition process.

Depending on the need, different tools may be able to focus on only the area of interest better than other tools that have all-in-one capabilities, i.e., the ability to capture text, phone messages, applications, and other device information as a single data capture.

Section 3: Defensibility

Parties engaging in document collection must pay careful attention to defense of process, meaning that parties must be able to demonstrate that the chosen methods and tools “accurately captured a sufficient number of relevant, nonprivileged ESI in existence, and that the remaining unreviewed and unproduced ESI is irrelevant.” 34William W. Belt, Dennis R. Kiker, Daryl E. Shetterly, Technology-Assisted Document Review: Is it Defensible?, 18 Rich. J.L. & Tech. 10, *5 (2012).

Even when parties commit substantial amounts of time, money, and effort to completing document collection in a thorough and adequate manner, parties may still face problems if they fail to “validate, document and accurately convey to a court or regulator the efforts taken to discharge discovery obligations.” 35See Ashish Prasad, Problems and Solutions in Electronic Discovery (copy available at http://pub.bna.com/lw/PrasadEDiscovery.pdf).

When facing a challenge to their discovery compliance efforts, parties must not only be prepared to reasonably explain their methods, they must also be prepared to back up their position with reliable information. 36Victor Stanley Inc. v. Creative Pipe Inc.

Implementing a good process for document collection, as discussed in Section 1 of this paper, is an important first step to achieving this goal. It is essential for parties to carefully document their discovery efforts, which will allow counsel to accurately convey the actions that were taken, and will allow, where necessary, e-discovery experts to support testimony regarding the validity of the discovery process. 37See Jason R. Baron supra n. 2 at 212.

Parties should also properly document the chain-of-custody of the documents that were collected. If third-party vendors are used, the parties should work carefully with the vendors to document all the steps that were taken, and to prepare discovery experts so that they may testify, should a challenge be issued.

In order to account for the electronic data that has been collected, each source file should be properly categorized, documented and assigned. A file should either be

  • (1) collected in native form;
  • (2) converted to a useable format and collected; or
  • (3) left out of the collection process.

38Allman, Prasad, Diana & Rooney, supra n. 6 at 6:2:2.

If a document is left out of the collection process, it should be because it was either:

  • “(a) not flagged by the initial search”;
  • “(b) culled upon further review”;
  • “(c) eliminated as a duplicate of another collected file”; or
  • “(d) impossible to process.” 39Id.

A party should have proper documentation for each category.

For example, if a file was converted, there should be a record as to why it was impractical to collect that document in its native version. 40Id.

If a file was culled, or was not flagged as responsive, the party should document what searches were applied, what queries were used, and the dates the searches were run. 41Id.

Conclusion

When facing electronic discovery, companies should be cautious in how they approach the document collection stage. In deciding how to approach document collection, parties should consider their electronic discovery needs, and must decide whether they will involve third-party vendors, and whether to invest in their own technology.

Parties should be prepared to begin document collection and analysis early in litigation, and should be careful to comply with all procedural rules. In preparing for data collection, parties often err on the side of over-inclusion at the earlier stages, to ensure that important data is not missed.

In determining the right tools and technologies for forensically sound data collection, companies should select the appropriate approaches in the areas of traditional forensics, network forensics, and mobile forensics.

Parties should be prepared to defend their document collection efforts with appropriate documentation of the steps that were taken and the chain of custody of the collected documents, utilizing experts to describe the collection efforts where necessary.

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.

Already a subscriber?

Log in to keep reading or access research tools and resources.