Summary
- AI developers need to work within evolving data privacy frameworks that increasingly give individuals control over their personal data, but this should not restrict innovation.
At the 2018 World Economic Forum in Davos, IBM CEO Ginni Rometty addressed world leaders and entrepreneurs, stating, “[e]very minute of every day, every action, reaction, decision, event and process is being expressed as data—data that is collectable and yielding knowledge.” Rometty’s comments intersected a general Davos theme focused on the development of artificial intelligence (AI) systems—including machine learning (ML), the type of AI that currently dominates the business world—and how those AI systems would transform society.
Within AI’s breathtaking possibilities lurk privacy concerns associated with this collectible, knowledge-yielding data. ML development not only relies on data, but it also works best with exactly the personal data that most implicates privacy concerns. Why society or the law should care about how AI may be exploiting personal data is one issue; how to address privacy in this context is another. While AI development seems to be driving us toward a potential privacy crisis, it may also offer solutions to it.
Present-day valuation estimates for the AI market range from $36.8 billion to $1.2 trillion, according to TechEmergence. Accenture projects an $8.3 trillion market value for AI services by 2035. For context, the American Bar Association valued the global legal services market at $593.4 billion in 2015. The automotive industry hit $1.7 trillion the same year. The dollars alone should encourage people to care, but the figures fail to tell the full story of what is happening behind the scenes with personal data. The estimates focus on the sales of those services, but fail to address the hidden costs potentially associated with privacy risks that could accompany AI development.
AI can be described generally as the creation of intelligent machines and is already intertwined with our lives. It improves commutes by predicting ride times or mapping fastest routes; filters spam from email; enhances fraud checking and credit decisions; personalizes social media and online shopping experiences; and makes smarter homes, personalized assistance, and chatbots possible. More specifically, ML develops machines that can mimic human learning and action by autonomously processing data. These machines rely on humans for programming and as sources of real-world data. Dan Meyer of TransPerfect describes ML as using algorithms, or sets of rules, that progressively improve themselves by feasting on data. Microsoft’s Manish Prabhu estimates that potentially billions of data sources are available for ML techniques, which combine these sources to determine human behavior; separate data sources provide only limited insights. ML AI is an exponential multiplier where the sum truly is much greater than its parts. The ultimate aim is deeper insight into what people actually do—and are likely to do—by melding discrete pieces of data.
Algorithms are designed to capture and use personal data, considering all dimensions of available information. The point of ML is not just to deliver actionable intelligence through AI development—it is to create a personal experience for a targeted individual, both replicating and influencing user preferences. According to CoreiBytes, “personalization” is part of the software development process; algorithms are designed to analyze user data individually. ML development focuses on repetitive behaviors and thereby personalizes outcomes. The need to access and use this truly personalized data is what sets AI on a collision course with privacy.
The privacy concerns with AI’s development are not new, but until recently laws did little to prohibit or regulate how AI could use data. The European Union’s General Data Protection Regulation (GDPR), in force since late May 2018, changed this legal landscape. This law regulates the use of EU personal data and restricts the export of covered data outside the EU. As other countries consider changes to their privacy laws to align with the GDPR, AI developers may confront further restrictions on the ways they can lawfully use data. A January 2018 report from Datatilsynet, the Norwegian data protection authority, indicated that developers were adopting AI in a “relatively restrictive manner,” corresponding well to the GDPR’s principle of data minimization. The report simultaneously described the problematic nature of AI’s personal data use, finding, “if people cannot trust that information about them is being handled properly, it may limit their willingness to share information,” which could present challenges to freedom of speech, public trust in authorities, and the commercial use of personal data across industry sectors.
AI’s focus on personal data and personalized outcomes collides with the GDPR, and emerging laws that mimic it, placing more control of personal data in the hands of individuals. For example, there may be direct conflict with requests from individuals who have the right to demand the erasure of their personal data. Developers maintaining ML programs may not have considered how to erase fully specific data AI uses to generate outcomes. The erasure of personal data (if possible) changes outcomes generated by the AI, or use of that data is inextricably part of generated outcomes so an individual’s information can never be fully extricated. AI’s use of personal data is not simply a matter of identifying protected data and restricting its use.
Developers typically cannot simply anonymize personal data before developing AI, because ML works best on personal, unsanitized data. Algorithm designers want to know what people will actually do. ML learns best from reviewing what people have already done. Additionally, as AI improves, even anonymized data may not mask individual identities. Enough anonymized data points could allow AI to uncover individuals behind anonymized data sets with reasonable certainty, defeating the purpose of anonymization in the first place (the so-called “differential privacy” issue). Even without AI, combining data points about location, education, shopping habits, age, marital status, and employment can fairly precisely limit the field of potential real individuals behind an anonymized data set; it just takes more time. Ultimately, AI may be able to take anonymized data and make it pseudonymized. But, because pseudonymized data is considered personal data under the GDPR, such data is subject to the GDPR’s restrictions.
The UK Information Commissioner’s Office wants AI developers to incorporate the GDPR’s data minimization principles. European Data Protection Supervisor Giovanni Buttarelli has called out AI developers for their failure to use familiar industry principles, such as (1) “Know Your Customer,” which businesses use to identify and verify customers, and (2) related due diligence, which businesses use to investigate obligations and investments, to develop similar ethical frameworks for AI innovation. Buttarelli wants “the perverse incentive in digital markets to treat people like sources of data . . . to be remedied.” He further explained that while older sectors are well regulated, the technology sector has largely existed outside the regulatory framework, allowing tech companies “to move fast and break things.” In contrast with many who claim privacy laws hinder new technology, his particular focus on regulation was presented as providing new market incentives, rather than stifling innovation. Traditional engineering practices of “can I do this” rather than “may I,” however, are difficult to break.
Several services already use AI in data in support of data privacy and protection solutions. Informatica, for example, utilizes an AI engine to review data sources for sensitive data with an eye toward responsible use and data protection. AI can also help to cull data sources in a way humans cannot, selecting elements that can be knit together while identifying individual elements that present data privacy concerns and removing those before a ML system is built.
Neural networks, the next evolution of AI development after ML, might be able to escape “dummy data” concerns. If development is properly focused, some combination of AI-sanitized ML development and emerging AI development techniques could make sanitized data useful for AI, mitigating privacy concerns.
AI developers will necessarily need to find ways to work within evolving data privacy frameworks that increasingly give individuals control over their personal data, but this should not restrict innovation. The European Data Protection supervisor, hinting at some flexibility in the GDPR, has stated that developers “should be able to take calculated risks with new products, in the light of honest assessments of the likely impact on people, and an analysis of possible unexpected consequences,” noting that “accountability is the biggest challenge when we think of AI.” AI development will not be stopped, but its development and improvement may be its fix. Neither solution presented here is a complete solution on its own. Any ultimate fix will likely be a blend of design techniques, regulatory considerations, and advances in technology to protect privacy while retaining effective AI end products.