Inferred Data As New Data
Proponents of inferred data as new data argue that if the source data was covered by privacy laws, then this new data ought to also be covered by the same regulations as the base data from which it is derived, regardless of its IP designation. They argue that the purpose of data protection and privacy laws is to protect consumers from the misuse or publication of their personal information, and that this purpose applies as much to personal information that results from an analytics process as it does to personal information that is directly obtained. However, this means that inferred data may need to be evaluated based on the context of its use and how it is generated to determine whether that use triggers the protections that data protection and privacy laws offer.
How Inferred Data Is Used Matters
Inferred data could be used to optimize internal business processes, in which case it may not have any relevance to consumers. But when inferred data is used to profile a person, it may have serious implications to that person. Because inferred data often represents predictions and not facts, the potential for harm may be greater than data provided by the person directly. In the context of profiling of individuals by identifying or predicting sensitive information about them, privacy regulations that intend to protect consumers would seem to be applicable to the inferred data. Similarly, when creditworthiness or likelihood of flight before trial are the predictions that are inferred, other consumer protection regulations would seem to apply strongly. It is important to note that there are laws that allow the input data to be corrected, such as reported credit data. But models could still produce a biased or unfair prediction based even on corrected inputs.
Furthermore, predictions based on machine learning models can be difficult to assess for accuracy because these models are trained and are often dependent on the input training dataset used to generate them. These models act like black boxes, where it is nearly impossible to understand how the unique variables and weighting factors were created. As a result, interpreting or correcting a prediction that is false or biased can be very difficult. Worse yet, these mistakes are difficult to litigate because the model cannot be cross-examined in court. Many privacy regulations, including the General Data Protection Regulation (GDPR) and the CCPA (California Privacy Protection Act) provide for a consumer right to correct data. If inferred data is subject to privacy regulations, this right of correction could be very difficult to apply.
In California, the attorney general has recently issued an opinion on the interpretation of inferred data under the CCPA. Specifically, the attorney general was asked whether a consumer’s right of access under the CCPA encompasses inferred data that a business has created about them. As is so often the case with a legal question, the answer is “it depends.” The attorney general determined that inferred data was within the definition of “personal information” under the CCPA only if it met two requirements. First, the inferred data must have been generated from specific categories of data that are identified in the statute regardless of whether that information was private or public, and regardless of how it was obtained. Second, the inferred data must have been generated for the purpose of creating a profile about a consumer that reflected their “preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.”
How Inferred Data Is Generated Matters
Inferred data may be subject to privacy rules not only based on how it is used, but also based on how it is generated. For instance, the Federal Trade Commission (FTC) has seemed to determine through recent decisions that inferred data is sufficiently tied to the processing of input source data, even for training purposes, that if the processing is tainted by fraud, the machine learning algorithms and models that process that tainted data are also tainted, as well as any inferred data that results from the processing of that input data. In one recent decision, EverAlbum was accused of collecting input data without proper consent for its use in training a facial recognition algorithm. As part of the decision, the FTC required EverAlbum to delete the machine learning model that was trained with the faces, as well as the algorithm used to create the model, and the output data created by processing of new facial images by that tainted model. Thus, inferred data that was generated by fraud or misrepresentation was the result of misuse and protected by consumer protection laws.
In summary, inferred data is widely agreed to be data that is the output of processing, rather than data that is provided directly or indirectly from a person. That may be where the agreement ends. Issues of to what extent inferred data is subject to privacy regulations and whether inferred data can be treated as intellectual property are still undecided, as are issues of automated decision-making based on inferred data. These issues will, in all likelihood, be the subject of much discussion as the amount and uses of inferred data continue to grow. For companies whose business models depend on their ability to generate and use inferred data, the outcome of these discussions could be critical to their future.