chevron-down Created with Sketch Beta.
January 01, 2016

Security Challenges of the Big Data Ecosystem Require a Laser-Like Focus on Risk

Security and privacy issues are magnified by the volume, variety, and velocity of Big Data. Large-scale cloud infrastructures, diversity of data sources and formats, the streaming nature of data acquisition, and high volume inter-cloud migration all create unique security vulnerabilities.1

Until recently, “big data” referred almost exclusively to data in large, Google-scale data centers. The number of organizations worldwide, both private sector and government, that are now collecting massive amounts of data is growing exponentially. Facebook and Twitter are global, healthcare records are becoming digital, and we are witnessing an explosion of developments with the Internet of Things (IoT), everyday objects equipped with sensors that can record, send, and receive data over the Internet and that are expected to number 25 billion by 2020.

The sensitive personal data being amassed by companies and governments is staggering. Inexpensive storage has enabled companies to collect and store large amounts of data and retain it far longer than they would have if it were on paper. New sources of data from sensors, cameras, and geospatial and other observational technologies are increasing. With bring your own device (BYOD) policies, big data is now expanding to organizations’ personal computers and mobile devices.

Uses of Big Data

A White House report focusing on the promise of big data stated that many people believe that big data provides opportunities “to grow our economy, improve health and education, and make our nation safer and more energy efficient.”2 Most agree that “[b]ig data technologies will be transformative in every sphere of life.”3 Big data also is an important tool for continuous monitoring that will inform executives about the threat, vulnerability, and compliance posture of their systems, as well as provide information about incidents that will need to be investigated.

Big Data—A Complex Technology Ecosystem

For executives, lawyers, and judges, the world of big data presents a plethora of terminology and concepts that must be mastered in order to assess the risks and understand the security and privacy issues. “Big data” is a complex ecosystem—terms such as NoSQL (databases designed to store massive amounts of unstructured and semi-structured data across clusters of machines) and Hadoop4 (open source software for distributed processing of large datasets) describe the core elements of the technical environment where complex processing takes place. NoSQL databases were developed to overcome the limitations of scalability and infrastructure cost presented by traditional relational databases. They provide high availability and can scale up to accommodate large datasets, and they are well-suited to run in the cloud computing environment. Hadoop is an ecosystem of applications that include Hive, HBase, ZooKeeper, Oozie, and Job Tracker.

About 150 different NoSQL databases currently are available. They are mostly open source, which means that the source code is publicly available for modification or enhancement by anyone. These databases are categorized into four broad types, named for the approach used to store data, information, and documents: Document, Key-Value, Graph, and Column databases. Many well-known companies such as Adobe, Amazon, Best Buy, Compose (formerly MongoHQ), eBay, Facebook, Google, IBM, LinkedIn, Lots of Words, Mozilla, Netflix, Twitter, and Yahoo! are using NoSQL databases.

The market for NoSQL/Hadoop software and services topped $1 billion in 2013 as measured by vendor revenue. By 2017, this market is expected to grow to nearly $3.3 billion, a compound annual growth rate (CAGR) of 45 percent.5

Big Data Provides a Torrent of Data at Risk

[T]he larger the concentration of sensitive personal data, the more attractive a database is to criminals, both inside and outside a firm. The risk of consumer injury increases as the volume and sensitivity of the data grows.6

Failed security has resulted in thousands of data breaches, which have led to the loss or compromise of millions of personally identifiable records; the theft of classified information, valuable intellectual property, and trade secrets; and the compromise of critical infrastructure. A website that goes by the name “Information Is Beautiful” illustrates hundreds of these major data breaches over the past decade.7

In reported data breaches, corporate and government databases are among the most compromised assets. Their stored customer records and other confidential business data—the heart of any organization—are inviting targets for cyber espionage and hacker attacks.

A sampling of the largest data breaches—spanning the financial, healthcare, retail, and government sectors—illustrates the heightened risk to millions of consumers when large data-sets of sensitive personal information are compromised: eBay, 145 million records breached (2014); Heartland, 130 million (2008–09); Target, 110 million (2013); Sony Online Entertainment, 102 million (2011); JP Morgan Chase, 76 million (2014); Anthem BlueCross BlueShield, 69–80 million (2015); Epsilon, 60–250 million (2011); Home Depot, 56 million (2014); Living Social, 50 million (2013); TJX, 46 million (2006–07); Office of Personnel Management (OPM), 22.5 million security clearance records, 5 million fingerprints (2015).8

Privacy concerns are heightened because of the lack of transparency. While much of big data analytics involves the analysis of personally identifiable information, most individuals have no idea who has obtained their data, what uses are being made of it, or that their data are being processed and stored in environments that may lack appropriate security.9

The Current State of Big Data Security

Hundreds of millions of consumers whose sensitive data, personal profiles, and “risk scores” are being analyzed, shared, and sold on a daily basis without their knowledge or consent are at risk of fraud and identity theft and possibly other criminal activity because of widespread security vulnerabilities in the big data ecosystem.

Security is only as strong as its weakest link, and this is particularly true of big data. Its distributed architecture presents a plethora of vulnerable points in processing and storage where sensitive and proprietary data can be compromised. Because security was not addressed in the open source development process, the architecture and design of big data present unsolved technology issues that must be addressed. Key security measures are generally not available or are not implemented in the most popular NoSQL and Hadoop systems. The security focus to date has been on securing the perimeter of a system, but what is behind the perimeter is not secure. Insider attacks are a particular threat.

Essential security controls are not provided in the default implementation of NoSQL databases. These controls include centralized security management, authentication (to verify identity of users and systems), authorization (access control policies), audit logging (to record data access and user actions), and encryption or data masking (to protect data in motion and at rest). For example, by default, the NoSQL database installs with no passwords, and created users are given read-only access without restriction, resulting in access to everything stored in the entire database.10 Audit logging is fragmented and lacks the fine-grained auditing needed to identify users who accessed that data and to record what actions they took. Passwords and data are transmitted in the clear—encryption must be provided by third-party security add-ons. Developers put the onus of security on application developers and database owners to run NoSQL and Hadoop in a trusted environment.

Hadoop was designed as a closed system, not for use by data applications across an enterprise. As a result, it has very weak security, and its security flaws are well known. Members of the Hadoop community admit that all they can do is secure the perimeter and what is behind the perimeter may be at risk. Lenient security mechanisms can be leveraged to achieve insider attacks that could remain unnoticed because of poor logging and analysis. “An attacker who can get into the data center either physically or electronically can steal whatever they want, since the data is unencrypted and there is no authentication required for access.”11

Lack of fine-grained access control is a weakness; current Hadoop access control permissions apply to entire “files” that a multitude of users can access. This scheme is not adequate to protect the volumes of different types of sensitive data. Encryption is not built into, or provided in, the open source software available at hadoop.apache.org.12 File-level encryption, or encryption for data in motion, is missing.

Members of the Apache Hadoop community, developers, third-party software vendors, and distributors are producing add-on security capabilities that fix some of the security vulnerabilities in Google’s NoSQL BigTable and Apache’s Hadoop, including access control, data masking, and encryption. As one example, Apache Accumulo was designed to address the most significant Hadoop vulnerabilities, including the lack of “fine-grained access control.” It provides BigTable cell-level software controls using a label or tag for each tiny piece of data that can be used to authenticate and authorize user access to the data.13

NoSQL databases were among the first to employ self-encrypting drives (SEDs) on a large scale to protect data whenever a device is lost, stolen, repurposed, at end-of-life, or in warranty repair. The Drive Trust Alliance was established to facilitate the adoption of SEDs.14 Because disk and solid state drives will be SED-capable by 2017 (most are now), and many organizations are not even aware that the drives they have purchased are already SEDs, this expanded use of encryption should enhance the protection of data at rest. SEDs provide among the best security whenever the stored data leaves the owner’s control.

Security Experts Focus on Database Security Vulnerabilities

A leading database attack methodology called Structured Query Language (SQL) injection (SQLi) was number 11 in the Verizon Top 20 Varieties of Threat Actions in the 2014 Data Breach Investigations Report.15 Although big data NoSQL technology is different from SQL, the same injection points—such as input fields—provide an avenue for attackers to access big data components. The NoSQL architecture is susceptible to various injection attacks that allow backdoor access to the file system for malicious activities.

Experts have looked at the database vulnerabilities exploited by hackers over a decade and identified the top 10 threats, as shown in the table above. These threats apply not only to traditional databases, but also to big data technologies.

The Path Forward—The Security Strategy for Executives, Corporate Boards, and Government Officials Includes Risk-Based Assessment

[O]rganizations are exposing their sensitive information to increased risk as they integrate open-source Hadoop into their IT environments. For that reason, companies serious about using big data effectively need to make sure they’re doing so securely, protecting their valuable information and securing private data so that it stays private.16

The increasing use of big data to analyze sensitive personal data and valuable corporate information requires a robust security environment. In light of the number of massive data breaches and well-documented NoSQL and Hadoop vulnerabilities, the need for all private- and public-sector organizations to develop, implement, and maintain an appropriate cybersecurity program is immediate and compelling. In fact, the American Bar Association adopted the following resolution in 2014: “[T]he American Bar Association encourages all private and public sector organizations to develop, implement, and maintain an appropriate cybersecurity program that complies with applicable ethical and legal obligations and is tailored to the nature and scope of the organization and the data and systems to be protected.” Big data systems are particularly at risk.

In many cases, data breaches or other types of cyber incidents could have been prevented or detected early, and the risks of the incident mitigated, if the organization had undertaken proper security planning and implemented appropriate security safeguards. Although cybersecurity challenges may seem daunting, existing frameworks, standards, and best practices provide a road map that public officials and business executives can follow to reduce the risks substantially.

A cybersecurity program is comprised of a series of activities. These activities include, for example:

  • governance by boards of directors and/or senior management;
  • development of security strategies, plans, policies and procedures, and privacy compliance requirements;
  • creation of inventories of digital assets;
  • selection of security controls;
  • determination of technical configuration settings;
  • performance of annual audits; and
  • delivery of training.

Organizations must be prepared if a cyberattack or data breach occurs or if an event interrupts their operations. Incident response is the practice of detecting a problem, determining its cause, minimizing the damage it causes, resolving the problem, and documenting each step of the response for future reference. Fully developed and tested incident response plans and business continuity/disaster recovery plans are critical components of a security program.

Risk Assessment of the Big Data Ecosystem

Cybersecurity is based on a systematic assessment of risks that are present in a particular operating environment. Risk assessments are undertaken to identify gaps and deficiencies in a cybersecurity program due to operational changes, new compliance requirements, an altered threat environment, or changes in the system architecture and technologies deployed. Assessing risk requires that organizations identify their threats and vulnerabilities, the harm that such threats and vulnerabilities may cause the organization, and the likelihood that adverse events arising from those threats and vulnerabilities may actually occur.

Every information system is different, including its design and architecture, hardware and software, and technical implementation. The risk assessment template above provides a starting point—company and government officials must tailor their risk assessments to the data, architecture, and technology of the big data system they own, oversee, or manage. A big data risk assessment should focus on specific characteristics and vulnerabilities of NoSQL and Hadoop technologies and on the environment(s) where the system is implemented.

Implementing Technology with Known Vulnerabilities Is Not “Reasonable Security”

[I]f [organizations] fail to secure the life cycle of their big data environments, then they may face regulatory consequences, in addition to the significant brand damage that data breaches can cause.19

Many data breaches and industrial control system (ICS) incidents involve exploitation of known vulnerabilities and violations of well-accepted security practices. With increasingly specific assessments by government agencies and private sector organizations of threats, risks, and vulnerabilities of big data, cloud computing, ICS, and mobile computing, combined with the publication of best practices for addressing cyber risks, standards of care are beginning to emerge.

Federal law enforcement agencies have brought cases against organizations that failed to employ reasonable security and put sensitive personal data at risk. Businesses have faced regulatory fines and investigations, civil damage actions, administrative proceedings, and criminal indictments. The recently published FTC guidance, Start with Security: A Guide for Business, analyzes lessons learned from more than 50 FTC enforcement actions and describes the security lapses that led to those cases. The penalties for failing to employ reasonable security to protect personal data are onerous, ranging from substantial fines to third-party audits spanning the next 20 years.

In its first security case under the Securities Act of 1933 (Regulation S-P), the U.S. Securities and Exchange Commission (SEC) charged R.T. Jones, an investment advisor, with failing to adopt proper cybersecurity policies and procedures prior to a breach. The SEC said R.T. Jones failed to conduct periodic risk assessments, implement a firewall, encrypt PII stored on its server, or maintain a response plan for cybersecurity incidents.20

Company and government executives should follow the following principles:

  • To properly support an organization’s risk-management framework, security must be incorporated into the architecture and design of the organization’s information systems and supporting information technology (IT) assets.
  • An organization must employ a defense-in-depth strategy to address all known vulnerabilities in the big data ecosystem.
  • Do not implement databases, software, or systems with known vulnerabilities—seek accountability by using appropriate cybersecurity procurement language. For example, the Energy Sector Control Systems Working Group’s model Cybersecurity Procurement Language for Energy Delivery Systems provides baseline cybersecurity procurement language for use by asset owners, operators, integrators, and suppliers during the procurement process. As emphasized in the guidance document, including cybersecurity in the procurement process can ensure that those purchasing and supplying big data systems consider cybersecurity starting from the design phase of system development. This further ensures that cybersecurity is implemented throughout the development, testing, manufacturing, delivery, installation, and support phases of the product life cycle, improving overall reliability and reducing cybersecurity risks.

Hackers and foreign governments have demonstrated the will, knowledge, capacity, and resources to successfully penetrate information systems and steal the most sensitive data held by private sector and government organizations. The threat is imminent and immediate action is required to assess the risks and implement appropriate security controls to protect the confidentiality, integrity, and availability of data and systems.

Endnotes

1. Cloud Sec. Alliance, Expanded Top Ten Big Data Security and Privacy Challenges 5 (2013), available at https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challenges.pdf.

2. Exec. Office of the President, Big Data: Seizing Opportunities, Preserving Values, at iii (2014), available at https://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1_2014.pdf.

3. Id.

4. Hadoop, https://hadoop.apache.org; Derrick Harris, The History of Hadoop: From 4 Nodes to the Future of Data, Gigaom (Mar. 4, 2013), available at https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data.

5. Jeff Kelly, Hadoop-NoSQL Software and Services Market Forecast, 2014–2017, Wikibon, available at http://wikibon.com/hadoop-nosql-software-and-services-market-forecast-2013-2017.

6. Edith Ramirez, Chair, Fed. Trade Comm’n, The Privacy Challenges of Big Data: A View from the Lifeguard’s Chair, Keynote Address at the Technology Policy Institute Aspen Forum (Aug. 19, 2013), available at https://www.ftc.gov/sites/default/files/documents/public_statements/privacy- challenges-big-data-view-lifeguard%E2%80%99s-chair/130819bigdataaspen.pdf; see also Remijas v. Neiman Marcus Grp., LLC, 794 F.3d 688, 693 (7th Cir. 2015) (“[I]t is plausible to infer that the plaintiffs have shown a substantial risk of harm from the Neiman Marcus data breach. Why else would hackers break into a store’s database and steal consumers’ private information? Presumably, the purpose of the hack is, sooner or later, to make fraudulent charges or assume those consumers’ identities.”).

7. World’s Biggest Data Breaches, Info. Is Beautiful, http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks.

8. Privacy Rights Clearinghouse, Chronology of Data Breaches: Security Breaches 2005–present, available at http://www.privacyrights.org/data-breach.

9. See Fed. Trade Comm’n, Data Brokers: A Call for Transparency and Accountability (2014), available at https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability- report-federal-trade-commission-may-2014/140527databrokerreport.pdf.

10. Hortonworks Security; Alex Woodie, Datanami; Zettaset.

11. Zettaset, The Big Data Security Gap: Protecting the Hadoop Cluster 6 (2014), available at http://www.zettaset.com/wp-content/uploads/2014/04/zettaset_wp_ security_0413.pdf.

12. Alex Woodie, Hadoop and the Encryption Mandate, Datanami (Nov. 20, 2013), http://www.datanami.com/2013/11/20/hadoop_and_the_encryption_mandate.

13. See Accumulo, available at https:// accumulo.apache.org (last visited Feb. 5, 2016).

14. Drive Trust Alliance, http://www.drivetrust.com.

15. Verizon, 2014 Data Breach Investigations Report 10, 25 (2014), available at http://www.verizonenterprise.com/DBIR/ (follow “Download 2014 DBIR” hyperlink).

16. MIT Tech. Review, Securing the Big Data Life Cycle 7 (2015), available at http://files.technologyreview.com/whitepapers/Oracle-Securing-the-Big-Data-Life-Cycle.pdf.

17. The NIST Guide for Conducting Risk Assessments, Spec. Publ. 800-30 Rev. 1 (2012), provides a step-by-step process for organizations on how to: (1) prepare for risk assessments, (2) conduct risk assessments, (3) communicate risk assessment results to key organizational personnel, and (4) maintain the risk assessments over time. NIST Managing Information Security Risk, Spec. Pub. 800-39 (2011), focuses on organizational risk management. In 2014, NIST published the Framework for Improving Critical Infrastructure Cybersecurity, a set of industry standards and best practices to help organizations manage cybersecurity risks as part of their risk management processes, available at http://www.nist.gov/cyberframework/upload/cyber security-framework-021214.pdf.

18. https://www.sans.org/critical-security-controls.

19. MIT Tech. Review, supra note 16, at 2.

20. Press Release, SEC, SEC Charges Investment Adviser with Failing to Adopt Proper Cybersecurity Policies and Procedures Prior to Breach (Sept. 22, 2015), https://www.sec.gov/news/pressrelease/2015-202.html.

21. Available at http://www.energy.gov/sites/prod/files/2014/04/f15/CybersecProcurementLanguage-EnergyDeliverySystems_040714_fin.pdf.

Entity:
Topic:
The material in all ABA publications is copyrighted and may be reprinted by permission only. Request reprint permission here.