November 30, 2012

Demystifying Big Data

John Pavolotsky

Much has been written in business journals and blogs about "Big Data," which refers to the recent proliferation in data volumes and types that are so large, diverse, and unstructured it is difficult for current technologies to store and analyze. In short, the rapid innovation and deployment of technologies that capture data - from GPS-enabled smartphones and tablets, to RFID tags, to the millions of cameras used by business and consumers to record daily activities - has outpaced the development of technologies to store, organize, and analyze the massive data sets.

The Harvard Business Review devoted its October 2012 issue to the subject of Big Data, with articles titled "Big Data: The Management Revolution," "Making Advanced Analytics Work for You," and "Data Scientist: The Sexiest Job of the 21st Century." Many articles address various technologies designed to help manage Big Data, and others discuss the advantages Big Data can bring to a company's marketing efforts or warn of the dangers of Big Data to individual privacy. At least one recent article, "The Death of Big Data," published on October 4, 2012, on, argues that "Big Data" will soon become "Any Data," because massive data volumes, comprised predominantly of unstructured data, are quickly becoming the norm. While business publications have written widely about Big Data, legal commentators have written sparingly on the subject. This article will explore the unique and not-so-unique legal issues that it raises.

The Business of Big Data

The tools and business applications of Big Data are varied and many. Developers are increasingly offering software applications that ingest, organize, and store data streams, such as social media feeds. To help cope with this massive collection of data, other technology vendors are focused on data integration, which becomes extremely important if your data is stored on a variety of unconnected devices in different formats, or analytical tools to help extract useful business information form the collected data. Collectively, these technology companies are trying to advance algorithms to harness the massive collected data to assist in better product development, power targeted advertising, devise more effective marketing campaigns, power e-commerce recommendation engines, assist dynamic pricing, or improve any number of other business functions.

Notably, in the world of Big Data, there is a fundamental divide. On the one hand, some companies have for years been collecting, analyzing, managing, and storing tremendous amounts of data. These include certain e-commerce concerns and online search engines. In "Big Data: The Management Revolution," Andrew McAfee and Erik Brynjolfsson astutely point out: "Online businesses have always known that they were competing on how well they understood their data." Similarly, as early as the mid-1990s certain brick-and-mortar retailers were building massive data warehouses populated by transaction and other data. This first group can be called "data mavens." Others are new(er) to Big Data and thus face steeper learning and adoption curves. This second group can be called "data tyros."

Both groups must address a number of business questions. First, as a practical matter, is there actually meaning in the data? As a corollary, are there enough people (data scientists) qualified or capable to make sense of such data? Massive amounts of data are being collected and stored (because it is now cheap to do so), but whether or not most of that data has any value remains to be seen.

The Legal Landscape: Case Law

As noted, few have written about the legal implications of Big Data, and of those, most have focused on privacy issues. It could be that Big Data issues are just data issues, or Internet issues, or device (e.g., mobile) issues. It could also be that Big Data technologies are not sufficiently distinguishable from other technologies to warrant a separate exposition. There are no Big Data cases to analyze. No legislation has been passed that appreciates the dynamics of Big Data, and in particular, the collection, use, disclosure, and storage of vast amounts and types of data. Lastly, it could be that due to the newness of Big Data, an opinion has yet to form. This will change. The remainder of this article will thus describe current legal developments related to Big Data and suggest potential developments as the courts and legislatures grapple with issues related to Big Data.

As intimated above, in the case law Big Data is still firmly in the hands of the early adopters. In fact, my research revealed only one case that so much as even mentions Big Data:

First, in the era of 'big data,' in which storage capacity is cheap and several bankers' boxes of documents can be stored with a keystroke on a three inch thumb drive, there are simply more documents that everyone is keeping and a concomitant necessity to log more of them.

Chevron Corporation v. The Weinberg Group, Misc. Action No. 11-409, D.D.C. Sept. 26, 2012. While there are no Big Data cases, geo-location tracking cases, of which there is no shortage, come the closest and shed the most light on Big Data issues. As detailed below, the case law, which involves data collected from attached GPS devices, GPS chips in mobile phones, and cell site towers, is mixed, but generally reflects a reluctance to address issues related to rapidly-changing and potentially invasive technologies. With Big Data, the elephant in the room is, of course, privacy; and while there are other issues, as discussed below, privacy, deservedly, crowds them out. In United States v. Antoine Jones, 132 S. Ct. 945 (2012), five Justices, led by Justice Alito, appeared to agree that prolonged monitoring of a person's whereabouts for most offenses would violate a person's reasonable expectation of privacy and, thus, absent a warrant or exigent circumstances, such monitoring would be unconstitutional. However, the case was ultimately decided on a common law trespass analysis, given that a GPS device had been attached to the undercarriage of Jones' car, whose location was tracked for 28 consecutive days. During this period, the GPS device collected more than 2,000 pages of data. The Court's conclusion left unanswered several questions that are important in an era of Big Data. For example, what sorts of offenses would be sufficiently serious to warrant a 28-day search? If 28 days is too long, what about three days or 10 days, and does the granularity of the location data matter? Put otherwise, the "reasonable expectation of privacy" test first articulated in Katz v. United States, 389 U.S. 347 (1967), may become increasingly difficult to apply, especially as technologies and our expectations of privacy evolve.

In particular, the "reasonable expectation of privacy" test comes from Justice Harlan's concurring opinion:

My understanding of the rule that has emerged from prior decisions is that there is a twofold requirement, first that a person have exhibited an actual (subjective) expectation of privacy and, second, that the expectation be one that society is prepared to recognize as 'reasonable.'

Katz involved wiretapping a phone booth, and thus eavesdropping on at least Katz's end of the conversation.

In United States v. Skinner, 690 F. 3d 772 (2012), the defendant was tracked for only three days, as his GPS-enabled cell phone was pinged periodically to determine location, and to ultimately apprehend him. There, the Court of Appeals for the Sixth Circuit found no Fourth Amendment violation because "Skinner did not have a reasonable expectation of privacy in the data given off by his voluntarily procured pay-as-you-go cell phone" that was used to transport contraband. In reality, GPS-enabled cell phones need to be pinged (by the phone company) to provide data about their location. Similarly, whether or not the phone was used in support of an illegal activity (transporting contraband) should have no bearing on the reasonable expectation of privacy. Judge Donald, in her concurring opinion, agreed to as much, but concluded that the location data evidence should not be suppressed, because a good faith exception to the warrant requirement existed in the case. En banc review by the Sixth Circuit has been petitioned.

At issue in In re Application of the United States of America for Historical Cell Site Data, argued on October 2, 2012, in the Court of Appeals for the Fifth Circuit, was 60 days' worth of historical cell site data, which could be used to determine the location of certain individuals in the underlying investigations. While oral argument was generally not terribly illuminating, the court was especially uneasy about the types of data, above and beyond the location of cell site towers at the beginning and end of each call, that would be divulged to law enforcement if the proposed Section 2703(d) order had in fact been granted. One of the critical issues, or side effects, of Big Data is this uneasiness about the collection of vast amounts of data about each person and not knowing what one will find when one opens Pandora's box. Further, the court did not seem eager to decide the Fourth Amendment issue. Rather, it seems more likely that the court will dispose of the case by ruling that Section 2703(d) of the Stored Communications Act (18 U.S.C. §§ 2701-2712), also at issue in this case, compels the issuance of an order once certain information is presented to the magistrate. At any rate, it is only a matter of time before the U.S. Supreme Court weighs in on the collection and use of mobile phone geo-location data and applies, as it must, Katz to the facts of the specific case. Further, given the rapidly-evolving technologies and the inherent squishiness of "reasonable expectation of privacy," a bright-line rule seems highly unlikely.

In "The Dead Past" (64 Stan. L. Rev. Online 117), Chief Judge Kozinski, of the Court of Appeals for the Ninth Circuit, questions how much today's society really values its privacy. He surmises:

In a world where you can listen to people shouting lurid descriptions of their gall-bladder operations into their cell phones, it may well be reasonable to ask telephone companies or even doctors for access to their customer records.

In other words, the goal post keeps moving, and even if a court faithfully applies Katz, the application will likely be troubling to a significant minority of us. The application will also likely be quite fact-specific, raising the question if, with respect to at least this aspect of Big Data, whether or not a legislative solution would be preferable. As a simple example, if the law was to expunge historical cell site data after, e.g., 12 months, there would be no In re Application of the United States of America for Historical Cell Site Data. Currently, there is no limit on how long phone companies can keep cell site data.

Of course, not using a cell phone is not an option, although, quite tellingly, one of the judges on the Fifth Circuit panel that heard In re Application of the United States of America for Historical Cell Site Data asked if the defendants simply could have not used their phones.

The Legal Landscape: Other Issues

There are other issues, such as data security and intellectual property rights, but for the most part, they are more data issues than Big Data issues. In analyzing such data issues, the practitioner should consider the entire data life cycle, which consists of data generation, transfer, use, transformation, storage, archival, and destruction. Consideration should also be given to the nature of the data, gateways (discussed below), business sector-specific laws and regulations, and the location of the consumer or data subject. In short, regardless of whether a certain technology or business practice raises a Big Data issue, the practitioner will still need to consider and analyze data issues raised by the relevant technology or practice.

Gateways, or devices through which data is collected, are critical. For example, earlier this year, California's attorney general announced that apps that collect personal data from California consumers must have a conspicuously posted privacy policy. On September 12, 2012, Representative Edward J. Markey (D-Mass.) introduced the Mobile Device Privacy Act (H.R. 6377), which mandates the disclosure of monitoring software installed on a mobile device or downloadable to such a device, the types of information that may be collected by such software, the recipients of such information, how such information will be used, and procedures to stop further collection of such information. The concept, though, is still notice and consent. Put otherwise, H.R. 6377 will not have any real impact on how much data is collected and for how long it is stored, thus sidestepping some of the major issues raised by Big Data. Further, and perhaps of greater interest (at least to those steeped in information security), the draft bill has rather detailed information security requirements for anyone who receives information collected by monitoring software, including a security policy, the identification of a security officer, and a process for identifying any reasonably foreseeable vulnerabilities in any system containing such information.

More broadly, the nature of the data is, of course, critical. Is the data PII (personally-identifiable information), and if so, is it from or relating to a child under 13, thus invoking COPPA? Likewise, does the data constitute PHI (personal health information), thus invoking the HIPAA Privacy Rule and the HIPAA Security Rule. PII would, of course, subsume PHI, and this is relevant, at least in part, because there is no private cause of action under HIPAA. As to the location of the data subject, the EU justice commissioner, Viviane Reding, has said that it should not matter where data is processed if EU privacy laws are violated. Jurisdictionally, this seems problematic, absent an office or data processing equipment in Europe, but, at any rate, if EU consumers are being actively targeted by a non-EU company, such a company should comply with, e.g., the EU Cookie Directive, and, in particular, the specific implementation adopted by the Member Countries whose consumers are being targeted.

Data generation speaks to the provenance, or source(s), of the data. Issues include the sufficiency of intellectual property rights in the data and whether or not the data was procured in such a way as to not invade a third party's privacy rights or violate a third party's publicity rights. Particular care should be taken by the customer in reviewing API (application program interface) licenses, to determine any limitations on data distribution. For example, some licenses may permit distribution only if the third-party data is accessed dynamically. While it is debatable whether or not APIs are copyrightable, the provenance analysis should still take into consideration the distribution (and other) limitations of the API license.

If the data will be entrusted to a third-party service provider, care must be taken to address permissible use(s) of customer data by the provider, the ownership of any intellectual property attributable at least in part to use of customer data, whether or not the customer data will be treated as customer confidential information, and the commercially reasonable security practices actually implemented and maintained by the provider. Other topics for consideration include the location of the data center(s) in which the data will be stored, whether or not there are any export controls with respect to the data, data quality, the ease with which the data may be accessed and transferred to another service provider, and whether or not there are any attendant costs, and the retention period for the data. Note that security, export control, data quality, and other issues do not disappear if the data is processed and stored internally.

The Road Ahead

While the legal implications of business models that address Big Data have yet to be analyzed extensively by the courts, with the exception of the cases discussed above and perhaps other disputes dealing with geolocation services, businesses can expect that similar issues will come before the courts soon, as they start to grapple with how people are starting to capitalize on Big Data. Some of the other contexts in which practitioners should expect Big Data issues to play a big part in disputes may include the following:

  • Consumer privacy complaints or other third-party complaints against data analytics services that provide data tied to e-mail addresses;
  • Other analytics services that provide "analytics" from cloud-based data, regarding customer behavior; and
  • Products and services responsible for analyzing, managing, and storing critical and highly complex and sensitive data, such as genomic data.


In closing, while, on the business side, the case for Big Data has been largely fleshed out, the legal ramifications remain to be elucidated. In the short-term, geo-location tracking cases will likely provide the most insight into whether the resolution of Big Data issues will be via judicial fiat or legislation. Regardless, the practitioner should be sensitive to Big Data issues and take a data-centric view of his or her client's products and services, as well as the products and services the client hopes to acquire.

John Pavolotsky

John Pavolotsky's practice focuses on technology transactions and other intellectual property matters at Greenberg Traurig, where he is of counsel. All views expressed herein are solely those of the author and should not be attributed to Greenberg Traurig.