March 12, 2020 Articles

Litigating Data-Hungry Machine Learning

Machine learning in medicine is pushing the boundaries of current privacy and consumer protection laws and regulations.

By Benjamin Yeager and Gary Marchant

In June of 2019, Matt Dinerstein brought suit against Google, the University of Chicago Medical Center, and the University of Chicago alleging that they violated his privacy by sharing (and receiving) his and hundreds of thousands of other protected medical records. See Dinerstein v. Google, LLC, Case No. 1:19-cv-04311 (N.D. Ill. June 26, 2019).

The University of Chicago Medical Center had provided Google with “deidentified” medical records from every patient treated by the university medical system from 2009 to 2016. The data sharing was part of a collaboration to develop machine learning artificial intelligence for detecting and diagnosing health conditions. The core issue in the case is whether the tremendous data-mining capabilities of a large data company like Google renders moot the exception under health privacy laws for deidentified data.

HIPAA Concerns

According to Dinerstein, many of the health records provided to Google contained date stamps, creating a prima facie violation of the Health Insurance Portability and Accountability Act (HIPAA). Because HIPAA does not create a private right of action, Dinerstein alleged state law claims, including breach of express and implied contract, tortious interference with contract, consumer fraud and deceptive business practices, intrusion upon seclusion, and unjust enrichment.

In response, the university and Google filed motions to dismiss Dinerstein’s claims, arguing that the arrangement between them is proper under the research exception to the HIPAA Privacy Rule, that Dinerstein lacks standing, and that the suit should be dismissed because Dinerstein’s attorney has a conflict of interest.

Data Privacy Versus Machine Learning

Whether or not the suit proceeds past the motion-to-dismiss stage, Dinerstein’s complaint highlights an interesting and important clash between the public’s growing desire for data privacy and the substantial data need for the development of medical machine learning.

As Dinerstein points out in his complaint, machine learning complicates privacy concerns because it enables efficient reidentification of deidentified patient data. A number of studies have shown that machine learning is highly effective at reidentifying individuals using auxiliary data, such as geolocation, physical activity records, and voter rolls. See, e.g., Liangyuan Na et al., Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets from Which Protected Health Information Has Been Removed with Use of Machine Learning, Jama Network Open 2 (Dec. 21, 2018); Nicholas D. Lane et al., On the Feasibility of User De-Anonymization from Shared Mobile Sensor Data, Ass’n for Computing Literature (Nov. 6, 2012).

Even data considered deidentified under HIPAA can be reidentified accurately. Under the HIPAA Privacy Rule, protected health information is considered deidentified if 18 specific identifiers are removed from the data and the covered entity has no actual knowledge that the information could be used to identify the individual. This provision of the Privacy Rule creates a safe harbor for health-care providers. See 45 C.F.R. §§ 164.501, 164.508, 164.512(i), 164.514(e), 164.528, 164.532. However, with the power of machine learning, it now may be possible to reidentify patient data that previously would be classified as deidentified. Dinerstein alleges that Google can use artificial intelligence and its big data collected by apps such as Google Maps and Waze, as well as devices such as Android phones and Fitbit products (which recently were purchased by Google), to provide geolocation data that can be used to associate the date stamps in patient medical records with specific patient identities.

As knowledge of the possibility of reidentification of patient information spreads, the HIPAA safe harbor provision may offer little protection to health-care providers, who may find themselves in privacy disputes with the Department of Health and Human Services (HHS) Office of Civil Rights.

NCVHS Recommendations

The HHS is aware of the potential HIPAA vulnerabilities raised by big data and last year tasked the National Committee on Vital and Health Statistics (NCVHS) with “identifying ‘privacy, security and access measures to protect individually identifiable health information in an environment of electronic networking and multiple uses of data.’” See Letter from William W. Stead, Chair, Nat’l Comm. on Vital & Health Statistics, to Alex M. Azar II, Sec’y, Dep’t of Health & Human Servs. (June 17, 2019).

The NCVHS has made six recommendations that, if adopted by the HHS, could foster certain privacy claims. The NCVHS recommended federal research on the risks of reidentification of patient information for different methods of deidentification of that information to “inform how HHS can meaningfully update HIPAA de-identification standards going forward.” The NCVHS also recommended establishing “federal health information security and privacy standards for medical device and mobile application manufacturers.” Like Google, these manufacturers are not directly governed by HIPAA. Additionally, the NCVHS recommended an evaluation of “how consumers might exercise their rights of action to seek redress” for violations.

If the HHS adopts these recommendations, anyone working with health information, whether a HIPAA-covered entity or not, could face increased responsibility and increased claims, which could lead to increased litigation.

The NCVHS also emphasized the need to balance consumer interests against the need to support needed health-care innovation. Id. This comes during a strong public and political push to try to improve health care in America, as well as decrease costs, improve outcomes, increase access to care, and reduce physician burnout. Machine learning in the medical field could improve health care in each of these areas. For example, machine learning tools have succeeded in identifying patients at risk for sepsis, diagnosing diabetic retinopathy, and predicting in-hospital patient mortality. See Ryan J. Delahanty et al., Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis, 73 Annals Emergency Med. 334 (2019); Richard A. Taylor, Prediction of In-Hospital Mortality in Emergency Department Patients with Sepsis: A Local Big-Data-Driven, Machine Learning Approach, 23 Acad. Emergency Med. 269 (2015).


Tasks performed by machines can free time for physicians to spend with patients, which could help drive down health-care costs and improve the quality of and increase access to care.

However, developing any algorithm requires massive amounts of data. For medical technologies, that generally means patient health information, which is widely viewed as private, protected personal data. Not surprisingly, because of the strong desire to use health data, some have cut corners with it, breaching individual privacy rights and opening the door for litigation.

Machine learning in medicine is pushing the boundaries of current privacy and consumer protection laws and regulations. As public awareness of the potential impact of machine learning on personal privacy increases, disputes and potential litigation over these issues will increase, too. Because of the need to balance the desire to protect personal privacy with the benefits from the development of lifesaving technologies, it is unclear what the outcomes of these disputes will or should be.

Benjamin Yeager is a JD candidate at the Sandra Day O’Connor College of Law at Arizona State University. Gary Marchant is a Regents Professor and faculty director of the Center for Law, Science and Innovation at Arizona State University.

Copyright © 2020, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).