January 17, 2020 Feature

Beyond Data Privacy: Data “Ownership” and Regulation of Data-Driven Business

By Peter Leonard

Global debate continues about the scope and coverage of data privacy laws and whether there is a need for further limitations on secondary uses of data collected through use of digital technologies. That debate has overshadowed discussion of a more fundamental issue: how laws should address ownership and rights of use of data, including valuable data about individuals that does not personally identify those individuals.

Competition authorities and consumer protection regulators in many jurisdictions have principally focused on the range and depth of data collected by global digital platforms about activities, interests, and preferences of users of those platforms. They have criticized global platform providers for lack of transparency to users as to secondary use of data gathered about user activities. They have suggested that the lack of transparency masks unfairness in value exchange, because consumer advocates and users are not able to assess the monetary value of derived data traded for provision of free services.

At the same time as regulators focus on these concerns, there is a growing deficit of trust of citizens and consumers in handling of digital data by many businesses. Issues and concerns that are raised about diverse digital technology and data-enabled applications include: algorithmically enabled discrimination between individuals or groups of individuals in the terms on which products and services are offered and whether they are offered at all; unanticipated, and sometimes opaque, linkage of multiple data sets relating to individuals and other secondary uses of data; uses of biometrics (including facial recognition technologies), health-related data, and geolocation data; pervasive surveillance; and increasingly granular online targeting. Many of these concerns arise, or are exacerbated, by our attachment to our mobile devices.

The Latest Bodily Appendage

Smartphones have become an extension of our bodies. Smartphone data is the richest enduring record of how, why, when, and where we act, go, think, see, and feel. Analytics of that data reveals our likes and dislikes, our wellness and our illnesses, what we think about buying and then don’t, what we buy and why we then bought it, and how we interact and with whom. Mobile phone data has ceased to be a secret shared only with our mobile service provider and closely regulated by communications privacy laws. Over the decade that our smartphone embedded itself in each of our bodies, it became the clearest window to our souls and our demons. Meanwhile, the device evolved into an applications delivery platform that captures diverse and valuable data and distributes that data across multiple players in ever more complex supply-side application data ecosystems. The digital exhaust of our everyday lives has become a valuable commodity. If our physical trash was that valuable, we might not have a global trash problem. Each smartphone is the authoritative record of some individual’s life, including geolocation and activity state (and therefore sex life), 24x7x365 and subject only to temporary separation for recharging.

As data collection and retention became cheaper over the decade and we moved to cloud services, we fed this new appendage more data, loaded and linked more applications, and placed it next to our pillows to monitor the quality of our sleep. Each year we further opened the digital window to our inner selves. With all this linking of diverse applications and cloud services, it became increasingly unclear who was collecting what data relating to us, and for what purpose. Sometimes it is not even clear what is the primary purpose for collection and use of data, let alone what secondary uses and disclosures of data may be occurring. Many mobile applications collect geolocation data by default, without any disclosed reason to do so. Today our mobile service provider usually knows less about what we are doing, and why we are doing it, than a myriad of mobile applications providers, the mobile operating system provider, and their contractors, many of whom we cannot even name.

Should we worry? After all, this is “our data.” Can’t new data privacy laws bring “our data” back under our control? Unfortunately, it isn’t that easy. In many legal jurisdictions, valuable data about individuals is owned by no one, and legal rights in data privacy are frequently misunderstood.

Diversity in Data Privacy Laws

Framing of data privacy laws around the world increasingly follows a common paradigm. Jurisdictions differ markedly as to whether and how business entities are regulated, as to powers and discretions of regulators, as to penalties and other sanctions for misuses and abuses, and as to formalities for cross-border transfers of regulated data. However, most national data privacy laws adopt a now relatively standard formulation of what information is regulated as personal information (also called personal data or personally identifying information) about natural persons. Broadly, that formulation is to protect both published and nonpublic information or opinion captured in any nontransient form (whether or not a formal record) in relation to a living person who is either identified or might reasonably be identifiable by any entity that has access to that information (taking into account other information reasonably available to that entity, such as other data points and data sources that might be used to infer or confirm the identity of that purportedly nonidentifiable person).

Once data is determined to be personal information, jurisdictions again differ significantly as to exceptions and requirements regarding notice and consent. Generally, across data privacy regulating jurisdictions, a level of transparency about collection and handling of personal information is required, through publication of a privacy policy. A higher level of transparency (such as express and informed consent) is required for collection and handling of more sensitive classes of information, such as information about health status, religion, race, political affiliation, and certain biometrics. Beyond this point, there is significant divergence between data privacy laws as to what manifests consent, ranging from a broad concept of inferred consent through prominent display of notice and subsequent user action, to a click-through “I agree” without forced view of underlying terms, to the European Union’s (EU’s) concept of unambiguous and affirmative express consent after a high level of explanation to a data user and coupled with regular reminders to assure currency of consent.

Biometric information, and biometric identifiers, are now commonly captured on our smartphones. Given the increasingly intimate association of smartphones with our bodies and our inner souls, it might reasonably be argued that all data about us that is captured through our use of smartphones should be treated as automatically within the sensitive class of information for which many privacy regulating jurisdictions require express, affirmative (opt-in) user consent. Which in turn leads to another question: Have we correctly framed what we regard as personal information about individuals that is within the scope of data privacy laws?

Profiling Using Nonidentifying Information

There is no generally accepted regulatory paradigm as to whether and how to regulate collection and uses of non-personally identifying information about individuals. However, this information is commercially useful and increasingly valuable. Ad tech and programmatic marketing ecosystems enable association of a device identifier or tracking code with search and browsing behavior on a particular site or across sites, which then allows inferences as to which audience segment the unidentifiable user likely belongs. These inferences, derived and applied in real time, can be used to differentiate between users in whether a product or service is offered to them and as to the price and non-price terms at which a product or service is offered. As algorithms are refined through experience, with more data and through machine learning, these inferences become more likely to be correct, and also less explainable.

Mobile device identifiers, SIM identifiers, and tracking codes (such as cookies or pixels) will be personal information about an individual if and when collected by an entity that is able to associate the device with an identifiable user. However, in many cases an entity collecting a device identifier or tracking code and using it to segment a user of that device for targeted marketing, or for other differential treatment, will not be able to identify the user of that device. In such cases, that entity is not collecting or using personal information about that user, as the entity had no reasonable means to identify the user of the device. However, segmentation of nonidentifiable individuals may lead to significant, systematic, and adverse effects on how some segments of individuals are treated relative to other segments. If I am a provider of airline reservations and I see a particular device repeatedly coming back for quotes, I may elect to increase the quoted price, inferring that no other provider is matching my quote. If I am an insurer and infer that a device is being used in a community where risks are typically higher, I may elect to quote a higher price than that which I offer to users known or inferred to be in lower risk locations. There are increasing calls by consumer advocates for expansions of data privacy laws to address opaqueness (lack of explainability and therefore lack of accountability) and potentially unfair or illegal discrimination between segments of individuals, whether or not the individuals within those segments are identifiable.

One regulatory response has been to expand the range of circumstances in which consent of affected individuals is required. Current EU General Data Protection Regulation (GDPR) restrictions on profiling require unambiguous express opt-in consent wherever automated profiling of individuals uses personal data of a data subject and will have significant legal effects upon how an individual is treated. Debate continues as to whether, in the circumstances described in the last paragraph, there is a relevant use of personal data that activates the restriction on profiling. In any event, many consumer advocates now observe that consumers don’t want more consent decisions forced on them. These advocates contend that what consumers want is for data collectors and custodians to act transparently and responsibly to nurture consumer trust. In our current world of “click through by default,” consumer advocates argue that an organization’s offering “choice” is really the organization “passing the buck” back to the consumer to make a decision that many consumers feel not competent to make, and which organizations know that many consumers will not bother to make, instead just clicking through. As a result, discussion of regulation of online targeting and other collections of data that may be used for segmentation of audiences is moving from being principally a debate about consent to tracking code toward being a much more nuanced discussion about consumer expectations as to how information about their activities is used, and by whom, and as to “fairness” in value exchange.

Data Value and False Correlations

Authentic engagement by data platforms and other data custodians with consumers as to data value and data fairness is also hampered by the fact that there is not always a close correlation between size of data holdings and potential to derive value from that data or, conversely, to cause harm to data subjects.

Common errors in correlation include overestimation of the value of raw consumer data, as distinct from consumer data as improved through data cleansing and standardization. Transformation of raw data enables linking of diverse data sets and derivation of correlations and then actionable insights. Often data capabilities of global consumer data platforms are automatically assumed to be capabilities of other large data-driven businesses, or capabilities of all entities participating in shared data ecosystems, when this is not the case.

Consumer data is tricky to value. Large volumes of data will often be less valuable than small volumes of the right mix of transformed and correlated data sets. Data value is created by ability of an entity to transform raw data to make it useful as an input for data analytics; to then endurably capture that value (not by ownership, but by keeping it secret, that is, by denying others access to that data); and to do these things in a way which does not excite regulatory intervention that may strip that value. Exclusivity of an entity’s practical control of data can be qualified through regulatory action in a variety of ways. Possible value-depleting regulatory interventions include enforcement of data protection, consumer protection, or competition (antitrust) laws; addressing information asymmetries through new requirements as to transparency of data uses (which may lead to consumer pushback as to what is being used and how); creating new “consumer rights” over data; and facilitating portability of transactional data at the request of data subjects.

Sometimes data derives value through enabling testing and development of code for application on other data. So-called artificial intelligence (AI) didn’t beat grand masters by being more intelligent. AI wore down the problem by incessantly playing games 24x7x365, generating “training data” to inform machine learning and thereby knowledge through experience beyond the reach of any human lifespan. And often a large volume of data of uneven quality can yield algorithms of substantial value, which may then make poor data or narrow data sets more valuable. In short, data (through the intermediary of code) can be transformative in value of other data.

Creating Value in Data

Valuation of allegedly “data-rich” businesses is often confused through failure to distinguish between the quantity and range of data sets that a business holds, and the capabilities (or lack thereof) of a business to transform those data sets into actionable insights. Transformational methods, code, and algorithms are often fungible across business sectors, with the result that data-rich businesses concentrated within particular industry sectors may not achieve economies of scope of data analysis that are available to cross-sector service providers. Scarcity of human capital, and in particular experienced data scientists, means that much data that is captured today is not transformed and never achieves its potential value. Human capital remains the key investment in cleansing, transforming, and linking data; in discovering useful correlations; and in creating and applying algorithms to data sets to derive actionable insights. Technology enables, but humans (still) create. What is more, humans are ambitious, fickle, and moveable. A quality people culture will often be the key business differentiator of good data-driven organizations.

To put it another way: the analogy commonly drawn between “control of data” and “ownership of oil” undervalues the value-adding contribution of the processes required to “refine” data effectively. Good insights as outputs are only possible through, first, a good deal of hard work in creation of quality data inputs and, second, development, refining, testing, and deployment of robust algorithms that are the engine of transformation of data into insights. Creation of quality data inputs and robust algorithms is difficult and can be slow. This is one of the principal reasons why many of the more ambitious predictions as to rollout of applications of AI have proven incorrect.

Valuable business insights are often deployed in disrupted product or service sectors that are characterized by increasingly short product life cycles, where returns on investment are highly uncertain. Markets for outputs of data are volatile and unpredictable. Refined (real) oil can be stockpiled, whereas much data is time sensitive and rapidly loses value. Oil is fungible across many industrial, transport, and heating applications, and the movement from fossil fuels to alternative energy is still agonizingly slow. Oil markets may appear to be volatile, but the markets for outputs of data analysis are often substantially more unpredictable.

The Best Asset You Never Owned

You can own oil, but (generally) you can’t own data. Most data sets are not copyright material. The closest simulation of “real” legal ownership of data that is available to most data controllers is to ensure that “its data” about consumers is defensibly legally protectable as trade secrets or confidential information.

However, usually data sets must be shared to some degree to yield value. Data sharing within multiparty data ecosystems is required to deliver almost all mobile services. Many mobile services require a complex supply-side data-sharing ecosystem of five or more data-holding entities. A mobile user (retail) application service provider is commonly supported by a third-party data analytics services provider, a cloud platform provider, a mobile platform services provider, a geolocation services provider, a data security services provider, and a payments services provider, all sharing user data. That sharing takes place in a world that is today without settled industry standards as to data privacy, data minimization, and management of supply-side data ecosystems.

Accordingly, significant sharing of sensitive and commercially valuable data is required to deliver many mobile services. At the same time, a service provider tries to capture data value by imposing safeguards and controls to ensure and demonstrate that “its data” remains a trade secret. This is a difficult balancing act.

Should Uses and Applications of Data Be Regulated?

Before we can determine whether particular uses of data need to be regulated, we need to apply a nuanced understanding of data and good data governance.

Data can be infinitely reproduced and shared at effectively zero cost. Data does not derive its value through scarcity. Value in data is usually created through investment in “discoverability”: in collecting and transforming raw data to enhance capability to link data to other data and then explore the linked data sets for correlations and insights. Often in data analytics projects, about 70–80 percent of the cost is cleansing and transforming raw data to make it discoverable; the high-end work of then analyzing the transformed data is the smaller part of a program budget.

Discoverability may be created within a privacy-protected data analytics environment. In many cases, substantial data value can be created and commercialized, without particular individuals becoming identifiable. Through pseudonymization of identifiers and deployment of appropriate controls and safeguards protecting data analytics environments, many uses of user data need not be privacy invasive. Of course, it is easier to link disparate data sets using personal identifiers than it is to deploy a properly isolated and safeguarded data analytics environment that uses only pseudonymized data linkage transactor keys. It is also easier to release outputs and insights without taking reliable steps to ensure that the outputs cannot be used to reidentify affected individuals. Good privacy management is exacting. The frameworks, tools, and methodologies for good data governance are immature and therefore not well understood. And good data handling on its own does not create good outputs. Executives of organizations often do not know how to evaluate the quality of their data scientist units and the reliability of data science outputs and insights. The term “data science” carries, as the term “management science” once did, an enticing ring of exactitude. Algorithms, however skillfully derived and applied, may be based on poor data or simply misapplied when used in particular contexts. Often poor data practices are implemented inadvertently, or as a result of cutting corners, rather than through bad intent.

Most importantly, we need to take into account that most user data is generated in circumstances where the relevant humans no longer understand or control the “data exhaust” associated with their activities or transactions. Where users are unknowing creators of data exhaust, they are particularly vulnerable to data uses that may be adverse to their interests. A simple example: I don’t choose to be observed by my very smart rental car, but I am. When I drive it out of the parking slot, I don’t reach for the vehicle manual to school up on the car’s data analytics capabilities. Even when I am presented with terms explaining particular data uses, life is too short for me to read and evaluate the terms. The result: I do not knowingly and reflectively give consent to particular uses.

Protecting the Rights of Participants in Multiparty Data Ecosystems

Should recalcitrant consumers (such as me), who don’t read all terms proffered, be punished for our unwillingness to engage with the torrent of privacy disclosures by organizations with which we deal? If I don’t care at all about my data privacy, I might still want to join ranks with many millennials and demand to know who is doing what, and deriving how much value, using data about me. Many millennials do not care about privacy or transparency by right, but sense that value is being derived from data about them, that free services are great, but that no-cost may be less than fair value, and that they are not given enough information about what is going on to force a meaningful negotiation over fair allocation of data value.

Many businesses are reluctant to initiate a discussion as to what is fair to consumers, because they can’t control that discussion, or they simply don’t want to give away value. Some early mover data platform businesses captured the data high ground and since then have engaged in tactical retreats, giving away certain data value if and when required to mitigate particular crises in digital trust of consumers. Many other data-driven organizations, such as some insurers and banks, are more willing to sacrifice short-term data value in order to preserve longer-term certainty and therefore sustainability for data value-adding investments. However, they are concerned that initiating a discussion with consumers as to fair data exchange can lead to unpredictable and uncontrollable outcomes: explanations of many data applications and data value chains are devilishly tricky and can sound self-serving, or just plain creepy. Try explaining to skeptical citizens and consumer advocates how real-time targeted advertising does not require any disclosure of a mobile user’s identity to the advertiser or its media buyer, or explaining how audience segmentation value is allocated at points in the complex and multiparty advertising and publisher supply chain. Emerging expectations of consumers as to transparency and accountability in handling of data about them may lead to an imperative for a provider to restrict data flows within a multi-entity data ecosystem, while at the same time regulators seek to force opening up of supply-side data ecosystems to new data intermediaries.

Leaving aside the desire for greater transparency as to data uses to facilitate consideration of fair value of data, why should a consumer need to engage with a data collector as to whether a particular collection of data is fair, proportionate, and reasonable? Regulators don’t require consumers to take responsibility for determining whether a consumer product is fit for purpose and safe when used for the product’s stated purpose, and unsuitable or unsafe when used for other purposes. Why should data-driven services be any different? Mobile users may not want transparency and responsibility forced upon them, so that they then must make a sensible decision (or just click through). Instead, mobile users may want accountability of the data controller, to ensure that the data controller responsibly and reliably does what is fair. However, fairness is a notoriously normative concept, which is why competition law seeks exactitude of economic theory in evaluating effects on consumer welfare. Beneficence for the majority of consumers also results in less than “fair” treatment of a few—at least as those few perceive their treatment relative to treatment of the majority. It all turns on the particular context.

The “Can” and “Should” in Data Regulation

Critics of data-driven businesses often say that many data businesses do not even try to balance their own and societal interests. They say businesses should stop to ask: Just because we can (use data in a particular way), should we? There is a risk that regulators will fall to a similar temptation in regulating use of consumer data by businesses. Big data holdings of global data corporations look like clear candidates for competition regulation. In many jurisdictions, data-driven businesses cannot assert legal protections available against takings of property, because the bundle of rights of a holder of trade secrets is not sufficient to be a proprietary interest. In such jurisdictions, rights of protection of trade secrets more readily yield to regulatory interventions than deprivations of interests in property.

Of course, the market capitalization of both “unicorns” and “data giants” demonstrates that public share markets and venture capitalists see value outside traditional classes of property. A single trade secret “asset” can be worth millions, or billions, of dollars. Google emerged out of seemingly nowhere using its trade secret algorithms to dominate the search engine world. Google’s success today depends on protecting the trade secret assets collectively managed under the Google brand. Many trade secrets derive their value through closely guarded central control: the formula for Coca-Cola, the Google search ranking algorithms, and so on. These trade secret “assets” may not appear in the balance sheet as assets, but derive value through being closely guarded; it is this effort to maintain trade secrecy that creates scarcity and value.

Regulators have a broad range of available regulatory tools that may be used to affect activities of data-driven businesses. Available tools include enforcement of data protection, consumer protection, and competition (antitrust) laws, and facilitation of enforcement by individuals of rights of access to, or portability of, transactional data (whether or not personal information about them) as held by data custodians. These tools should be selectively and surgically used to address particular contexts of data use by businesses that warrant regulatory intervention. But protection of consumers, of individuals’ rights of privacy, and of fair competition between entities that operate in a shared data ecosystem over a data platform controlled by one of the parties, are tightly intertwined. Rebalancing the rights and responsibilities of participants in this ecosystem—affected individuals, other consumers, platform operators, and entities that willingly or not contribute relevant data through use of the platform—can have profound implications. There is clearly a role, and a need, for good regulation. But context is critical in dynamic markets. Outcomes of regulatory interventions may be unpredictable and unintended. It is hard to be a good regulator.


By Peter Leonard

Peter Leonard is principal of Data Synergies and professor of practice (IT systems and management and business law) at UNSW Business School in Sydney, Australia.