chevron-down Created with Sketch Beta.
December 02, 2015 Articles

Adapting the Collections Process to Accommodate Nontraditional Data Sources

By Steve Ramey and Katie Askey

In our previous article, "Beyond Document Review: The Discoverability of Nontraditional Sources" from Spring 2015, we introduced the concept of nontraditional electronically stored information (ESI) and the importance of capturing this data in a meaningful way. This article takes a deeper look into how the collections process can be affected by the Computer Fraud and Abuse Act and adapted for nontraditional ESI such as structured data, audio, chat, mobile device data, and social media.

The Computer Fraud and Abuse Act
Digital forensic examiners have the difficult job of balancing the investigation of leads as they are uncovered while still remaining within the scope of their investigation. User names, passwords, account numbers, and protected and proprietary information are often uncovered, recovered, and provided during forensic investigations. Digital forensic examiners, in particular, have access to and are granted access to this information to perform their job. Digital forensic examiners are hired to collect information from cloud accounts, mobile devices, computers, and other digital storage media, whether physically present or across a network. These devices usually contain the life and soul of a person or a company. It is imperative that the digital forensic examiner exercises control over and protects the security of this information, as one misstep could turn the examiner into a criminal.

There are several regulations, laws, and ethical commitments that digital forensic examiners adhere to. In particular, the Computer Fraud and Abuse Act (CFAA), 18 U.S.C. § 1030, was written to protect against abusive behavior of accessing computer systems without authorization. The CFAA is meant to lawfully protect computers and access to the information in those computer systems from intentional access that may cause harm to that system, a government agency, a business, or a person through the theft or destruction of electronic data.

During the course of a computer forensic investigation, the examiner must perform his or her analysis within a limited scope. This scope is specific to the matter and typically consists of analyzing several types of artifacts from many different devices. For example, when investigating the theft of intellectual property (IP), artifacts such as Internet history and the computer registry are analyzed to determine if the user navigated to websites to upload files or researched how to cover his or her tracks, and to identify if files were opened from an external device. While those are just a few examples of what the artifacts can contain in support of an investigation, they can also contain user names and passwords, URLs, and other sensitive information that may or may not be pertinent to the investigation.

Typically, the scope for analysis rests solely on the device and the artifacts recovered from that device. As the analysis is conducted and new information is obtained (i.e., username, password, and cloud account URL), the examiner should not act on that information without obtaining the necessary authorizations. The authorizations serve the following distinct purposes:

1. The examiner is acknowledging that his or her analysis of the device has confirmed information that other storage repositories were used by the user.

2. The examiner is seeking account owner information and authorization to access the system or account that was not originally in scope by either the owner or representative counsel.

Mobile devices can be a gold mine of personal and business-oriented information. They can contain email messages (business and personal), text messages, banking data, pictures, call logs, GPS locations, and stored passwords. As the use of mobile devices and the trend to include mobile devices in digital forensic analysis increases, the greater the exposure digital forensic examiners will have to account information for that user.

At no point should the examiner leverage the credentials to preview the account contents in an effort to aid conversations and sway approval for accessing the system. Using the credentials without authorization may be in violation of the CFAA or other laws or governance. The mere fact that a burglar left an ID and house keys on your doorstep after robbing you does not give you the right to return the favor and pay the burglar a visit. Likewise, the forensic examiner is a trusted source with a strong ethical commitment. Uncovering sensitive information is expected, but leveraging the information without authorization can present unanticipated complications.

Collections: Industry-Leading Practices
The IT infrastructures of organizations can be complex because they can include more than just the user's assigned computer. Outsourced hosting, overseas data centers, clouds, virtualized environments, and bare metal systems are some of the common themes that can compromise organizations' IT infrastructure. To further complicate their infrastructure, there are often several IT policies that should be taken into consideration. Backup frequency, retention, legal hold, end-user permissions, encryption, and BYOD (bring your own device) are just some examples; not to mention data privacy laws of the outsourced and overseas systems. Further, what are the users within the organization actually doing with their data? Where are they storing their data? Do they have the ability to set up a rogue server for their department's use, not administered by IT? Do they leverage noncompany approved cloud services for data transfer and storage? In this world of ever-evolving technological enhancements, the identification of data repositories throughout organizations becomes that much more complex.

There is an industry-leading practice that can potentially alleviate the complexity and confusion of walking into an organization and simply imaging the users' laptops. Before the decision to collect is made, a conversation with IT should occur to understand the IT infrastructure, policies, privacy, and other behavior that is "normal" to their organization. Having the conversation up front will help save resources, solidify a collection strategy, and, overall, be more cost conscious. Oftentimes, it is worth involving experts early on to assist in this uncharted territory to save money on potential rework or legal fees.

In addition to this enhanced planning practice, unique considerations for each data source should be taken under consideration during the collections process.

Structured data. Today, it is not uncommon for companies to have dozens (if not hundreds) of live and retired systems for their various functions—accounting, finance, human resources, and quality assurance to name a few. Therefore, it is imperative to take the time up front to thoroughly review the full list of available systems and data to determine what is relevant and potentially responsive. This practice can save significant time and cost while minimizing risk of exposure.

Take for instance, a client that had over 900 systems due in part to user-created databases and multiple acquisitions. Obviously, collecting all of this data was neither desirable nor practical. It took about five months to review the list, track down subject-matter experts within the company, and pare the list down to 40 systems that were ultimately deemed responsive and relevant. The upfront resources required to perform this large system review were well worth the effort to avoid performing collections on hundreds of nonrelevant systems. While this may be an extreme example, almost all companies could benefit from a smaller version of this exercise.

Once the list of systems has been finalized for collection, it is recommended that all data stored within the system be collected. This includes all back-end tables, views, and relationships. It is much easier and more efficient to prepare a subset of the full system than it is to make another collection if something is found to be missing.

Audio. Audio files can take many forms including voicemails, recorded meetings, and webinars. In many instances, companies outsource the storage of audio files. Whether or not this is the case for your organization or client, it is very important to understand what type of metadata and lookup information is available in order to efficiently and confidently request data at the time of collection. There have been too many instances when clients do not explicitly ask for the metadata lookup information required to adequately perform their collection, analysis, and review. This not only causes frustration, but also may result in additional time and increased costs if the data has to be requested again.

Other options for storing audio files include storing on local hard drives and/or on an email server. Understanding how to identify, request, query, and analyze this information will help to determine scope and contribute to a successful collection strategy.

Chat. Lync, Communicator, Google Chat, and Bloomberg are just a few examples of products used by companies as an internal communication tool for employees. Identification of these systems and collections from these systems can be challenging, as the rights to access may lie solely with the user or with the hosting organization. In addition, the export format may present its own challenges. It is particularly important to determine the overall processing, review, and production strategy in advance when including chat data in the collections process. The preferred format of chat data may be dictated by the discovery strategy. For example, some formats lend themselves more easily to near deduplication and elimination of noise words. This can ultimately help to greatly reduce the volume of data for review.

Mobile devices. The biggest challenge with collecting mobile device data can be attributed to the ever-evolving landscape of the industry. Mobile device hardware and software are continuously evolving at a rapid pace to meet the demands of the consumer. To make the evolution process more complex, there is no standardization between device manufacturers nor within the manufacturers' product line. For example, Apple's iPhones, though they look similar, each have a different encryption complicating the process for accessing certain areas of the device and limiting the content that can be extracted during forensic acquisition by model, and in some cases, by the operating system version. In order for mobile device forensic acquisition and analysis software to keep up, the forensic software and hardware manufacturers have to obtain the new device, reverse engineer it, and then release an update to their products. This can be a lengthy process and results in a delay before the newer devices are supported and able to be collected easily.

Additionally, these collections often require some level of cooperation by the owner, specifically to obtain the passcode. For instance, no one has been able to find a way to break the BlackBerry device encryption; while methods do exist, they may not necessarily guarantee results. Other times, the phone needs to be configured in a very specific way to allow access without a password. There are even devices, such as those created by Silent Circle, that are heavily encrypted and designed to ensure that data cannot be collected.

Mobile device applications. There are millions of applications available for download from Google's Play Store and Apple's App Store. Amazon, Sony, and Samsung also offer downloadable applications. In addition, there are countless other stores offering content to download applications globally. As new applications are created and used, new forensic analysis methods are needed to analyze the data. It also goes without saying that there is no standardization to the software, requiring, in some instances, unique approaches to capturing and recovering the application information for analysis during investigations.

The type of data that can be collected from applications is largely determined by how the application itself is written. Luckily (or unluckily depending on which side you are on), many popular applications such as Snapchat, Evernote, and WhatsApp have structured logs and data. Even when an application is deleted, the data can typically be recovered unless the application is written in such a way that the database is compacted.

Social media. Social media applications and websites bring their own unique sets of challenges. The ease in which social media data can be collected is dictated by the application program interface (API). The API is the software code that the social media websites use to allow external developers access to the social media site, functionality, and data. Additionally, social media companies typically require a subpoena, court order, or search warrant to access account information. However, information can be acquired from social media accounts with forensic acquisition software in a couple of ways:

  1. Publicly available information: Screen scraping the account page as a visitor would reveal the public page. This typically does not contain private information like private messages or hidden information available to account friends.

  2. Private information: Authorization from the user or account holder and obtaining his or her username and password can allow for acquisition of private information as well as public information. As mentioned earlier, leveraging account information without authorization can lead to legal complexities.

With the numerous complexities related to identifying systems, adhering to regulations, and staying abreast of technology enhancements, forensic acquisition has become much more involved and complex. Long gone are the days of simply "pulling a hard drive" and creating a forensic image. Having the foresight to create company policies in anticipation of litigation and investigations can prove beneficial to reducing costs, decreasing response time, and eliminating frustrations. Perhaps with a little help from the experts, these collections can be made relatively painless with great potential benefit to your cases and your clients.

Keywords: litigation, minority trial lawyer, minorities, electronically stored information, ESI, nontraditional data, computer forensic investigation, Computer Fraud and Abuse Act

Copyright © 2018, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).