Data Privacy Versus Machine Learning
Whether or not the suit proceeds past the motion-to-dismiss stage, Dinerstein’s complaint highlights an interesting and important clash between the public’s growing desire for data privacy and the substantial data need for the development of medical machine learning.
As Dinerstein points out in his complaint, machine learning complicates privacy concerns because it enables efficient reidentification of deidentified patient data. A number of studies have shown that machine learning is highly effective at reidentifying individuals using auxiliary data, such as geolocation, physical activity records, and voter rolls. See, e.g., Liangyuan Na et al., Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets from Which Protected Health Information Has Been Removed with Use of Machine Learning, Jama Network Open 2 (Dec. 21, 2018); Nicholas D. Lane et al., On the Feasibility of User De-Anonymization from Shared Mobile Sensor Data, Ass’n for Computing Literature (Nov. 6, 2012).
Even data considered deidentified under HIPAA can be reidentified accurately. Under the HIPAA Privacy Rule, protected health information is considered deidentified if 18 specific identifiers are removed from the data and the covered entity has no actual knowledge that the information could be used to identify the individual. This provision of the Privacy Rule creates a safe harbor for health-care providers. See 45 C.F.R. §§ 164.501, 164.508, 164.512(i), 164.514(e), 164.528, 164.532. However, with the power of machine learning, it now may be possible to reidentify patient data that previously would be classified as deidentified. Dinerstein alleges that Google can use artificial intelligence and its big data collected by apps such as Google Maps and Waze, as well as devices such as Android phones and Fitbit products (which recently were purchased by Google), to provide geolocation data that can be used to associate the date stamps in patient medical records with specific patient identities.
As knowledge of the possibility of reidentification of patient information spreads, the HIPAA safe harbor provision may offer little protection to health-care providers, who may find themselves in privacy disputes with the Department of Health and Human Services (HHS) Office of Civil Rights.
The HHS is aware of the potential HIPAA vulnerabilities raised by big data and last year tasked the National Committee on Vital and Health Statistics (NCVHS) with “identifying ‘privacy, security and access measures to protect individually identifiable health information in an environment of electronic networking and multiple uses of data.’” See Letter from William W. Stead, Chair, Nat’l Comm. on Vital & Health Statistics, to Alex M. Azar II, Sec’y, Dep’t of Health & Human Servs. (June 17, 2019).
The NCVHS has made six recommendations that, if adopted by the HHS, could foster certain privacy claims. The NCVHS recommended federal research on the risks of reidentification of patient information for different methods of deidentification of that information to “inform how HHS can meaningfully update HIPAA de-identification standards going forward.” The NCVHS also recommended establishing “federal health information security and privacy standards for medical device and mobile application manufacturers.” Like Google, these manufacturers are not directly governed by HIPAA. Additionally, the NCVHS recommended an evaluation of “how consumers might exercise their rights of action to seek redress” for violations.
If the HHS adopts these recommendations, anyone working with health information, whether a HIPAA-covered entity or not, could face increased responsibility and increased claims, which could lead to increased litigation.
The NCVHS also emphasized the need to balance consumer interests against the need to support needed health-care innovation. Id. This comes during a strong public and political push to try to improve health care in America, as well as decrease costs, improve outcomes, increase access to care, and reduce physician burnout. Machine learning in the medical field could improve health care in each of these areas. For example, machine learning tools have succeeded in identifying patients at risk for sepsis, diagnosing diabetic retinopathy, and predicting in-hospital patient mortality. See Ryan J. Delahanty et al., Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis, 73 Annals Emergency Med. 334 (2019); Richard A. Taylor, Prediction of In-Hospital Mortality in Emergency Department Patients with Sepsis: A Local Big-Data-Driven, Machine Learning Approach, 23 Acad. Emergency Med. 269 (2015).
Tasks performed by machines can free time for physicians to spend with patients, which could help drive down health-care costs and improve the quality of and increase access to care.
However, developing any algorithm requires massive amounts of data. For medical technologies, that generally means patient health information, which is widely viewed as private, protected personal data. Not surprisingly, because of the strong desire to use health data, some have cut corners with it, breaching individual privacy rights and opening the door for litigation.
Machine learning in medicine is pushing the boundaries of current privacy and consumer protection laws and regulations. As public awareness of the potential impact of machine learning on personal privacy increases, disputes and potential litigation over these issues will increase, too. Because of the need to balance the desire to protect personal privacy with the benefits from the development of lifesaving technologies, it is unclear what the outcomes of these disputes will or should be.