chevron-down Created with Sketch Beta.

Public Contract Law Journal

Public Contract Law Journal Vol. 50, No. 3

Data Rights in Artificial Intelligence: Who Will Own Skynet?

Jared Aden Looper


  • Discusses unique aspect of AI technology and the challenges of applying current DFARS rules to properly protect the underlying intellectual property
  • Argues that the current DFARS data rights provisions cannot solve the problem of how to allocate data rights in machine learning AI between the government and contractor
  • Recommends (i) revisions to the DFARS; (ii) use of OT authority for AI development; and (iii) use of already available DFARS flexibility
Data Rights in Artificial Intelligence: Who Will Own Skynet?
Comezora via Getty Images

Jump to:


In the last few years, the technology known as “machine learning” artificial intelligence (AI) has tremendously accelerated the capacity of computers to solve complex problems which were previously only solvable by humans. This technological development is fueling an AI arms race between the global powers. In furtherance of this arms race, the United States will shortly be entering into contracts for the development of machine learning AIs for military applications. Because of their flexibility and power, these AIs will be tremendously valuable as intellectual property (IP). This article argues that the Federal Acquisition Regulations (FAR) provisions on data rights in software, which were drafted before machine learning AI was even a concept, do not clearly allocate IP rights in machine learning AIs between the contractor and the Government. It further explains why other sources of IP law are likewise inadequate to deal with machine learning AIs. Finally, this article provides three policy suggestions to correct this issue.

“Artificial intelligence is the future, not only for Russia, but for all humankind. It comes with colossal opportunities, but also threats that are difficult to predict. Whoever becomes the leader in this sphere will become the ruler of the world.”

I. Introduction

Until 2017, the open-source software program known as Stockfish reigned supreme as the most powerful chess software in the world, having won the Top Chess Engine Championship four times. But in December 2017, artificial intelligence (AI) toppled Stockfish from its long-held throne. The program that beat Stockfish was an AI program called AlphaZero. Whereas most computer chess programs are endowed by consulting chess grandmasters with deep knowledge of chess strategy, a razor-sharp understanding of tactics, and extensive software libraries of openings and endgames, AlphaZero’s creators programmed it solely with the rules of chess. AlphaZero then taught itself chess strategy and tactics. With no assistance from its human programmers, AlphaZero moved from complete ignorance of chess to beating the greatest chess-playing software in the world. AlphaZero accomplished this task in four hours.

Fueling AlphaZero’s remarkable feat was a revolutionary AI technology known by various names: reinforcement learning, deep learning, neural network deep learning, or simply machine learning. Machine learning is an AI programming technique in which the AI is given a task to accomplish (for example, winning at chess), and is then set loose to try to accomplish the task, iteratively reprogramming itself to get better at accomplishing the task.

Unlike legacy AI systems in which human programmers write the software code that instructs AIs how to accomplish their tasks, machine learning allows an AI to reprogram itself to accomplish whatever task it is assigned, making it versatile and extremely powerful. Machine learning AIs have successfully taught themselves how to dominate in chess, win in the intricate strategy game known as Go, play video games, and even conduct complex legal analysis better and faster than corporate lawyers.

Machine learning AI also presents military applications. Weapons systems controlled by AIs, for example, will have superhuman reaction times, virtually ensuring that an AI-controlled weapons system will defeat any human-controlled weapons system. Additionally, AIs will precisely pilot fixed-wing airborne weapons systems that are otherwise difficult to control. For example, the Navy’s Northrup Grumman X-47B, an experimental jet-powered crewless combat aerial vehicle, demonstrated precision carrier landings that no human could match. Finally, machine learning AIs will enable superhuman processing precision and speed in the complex task of signals intelligence (SIGINT) analysis.

Given the military advantages of possessing the emerging technology of machine learning AI, the Department of Defense (DoD) has naturally shown keen interest in acquiring it. That said, and unfortunately for the DoD, acquisition law has not kept pace with this technology’s advances, particularly in the realm of intellectual property (IP).

In Section II below, this article argues that the data rights provisions of the Defense Federal Acquisition Regulation Supplement (DFARS) are an inadequate means of allocating IP rights in machine learning AI between the government and contractors. Section III will then present recommendations for solving this problem.

II. Data Rights and Machine-Learning AI: Why the Status Quo Is Insufficient

This section discusses the current state of the relevant data rights provisions of the DFARS. Thus, in Subsection A below, this article will explain the current DFARS rules on government data rights in noncommercial software. Next, Subsections B and C will discuss two special types of data rights arrangements: Specially Negotiated Rights and Rights in Special Works. Subsection D will then apply the data rights provisions of the DFARS to machine learning AI to show that the current state of data rights law is inadequate to allocate data rights in machine learning AIs between contractors and the government. Finally, Subsection E will briefly explain why more traditional IP protection forms would not help a government contractor develop a machine learning AI for the DoD.

A. Rights in Noncommercial Software

While many government contracts allocate IP rights between contractors and the government, the Federal Acquisition Regulation (FAR) and the DFARS do not use the term “intellectual property.” Instead, the FAR uses the terms “technical data” and “computer software.” The DFARS defines technical data very broadly, as “recorded information, regardless of the form or method of recording, of a scientific or technical nature . . . .” In DoD contracting, then, technical data is essentially any technical information describing the specifications or workings of the weapons and weapons systems that the DoD purchases.

As to computer software, the DFARS defines it as “computer programs, source code, source code listings, object code listings, design details, algorithms, processes, flow charts, formulae and related material that would enable the software to be reproduced, recreated, or recompiled.”

The DoD carves a distinction between commercial and noncommercial computer software. Commercial software is software that is available for purchase or license by the public. Thus, when the government purchases or licenses commercial software under a government contract, it has no unique bargaining position. Therefore, it typically receives standard license or use rights that are available to every consumer. Whenever the government requires a broader license, it must negotiate with the vendor for these rights.

The DFARS defines noncommercial computer software negatively, as “software that does not qualify as commercial computer software.” In other words, noncommercial computer software is what is not sold to the public—a definition that would include a machine learning AI system designed specifically for a military purpose. Under the DFARS, the government acquires varying levels of rights in noncommercial software based on how much the government funds the software’s development. Generally, the greater the government’s share of development costs, the greater its rights in the developed software will be.

First, the government obtains “unlimited rights” in software that the contractor develops during the performance of a contract exclusively using government funds. Unlimited rights are extensive: they include the right to “use, modify, reproduce, release, perform, display, or disclose computer software or computer software documentation in whole or in part, in any manner and for any purpose whatsoever, and to have or authorize others to do so.”

The next level down from unlimited rights is “government purpose rights.” The government obtains government purpose rights in software developed with “mixed funding,” meaning software partially developed at private expense and partially developed with government funding. Under this regime, the government may (a) freely copy, modify, or use the noncommercial software within the government, and (b) release or disclose the software to individuals and organizations outside the government for their use, copying, or modification, provided that the outside individuals are doing so for United States Government purposes. Government purpose rights typically last for five years after the contract’s execution, and later, the government will obtain unlimited rights.

Finally, the government obtains only “restricted rights” when the noncommercial software is developed solely at private expense. “Restricted rights” are as narrow as unlimited rights are broad. When the government has restricted rights in noncommercial software, it may only use the software on one computer at a time. Additionally, if the government wants to use the software on a new or different computer, it must “destroy” the copy of the program on the first computer. Finally, although the government may retain a copy on a remote computer backup and allow contractors to access the software when necessary, those contractors are prohibited from decompiling or otherwise trying to reverse engineer the software.

B. Specially Negotiated Rights

Although the tripartite rule above is the default rule for allocating rights in noncommercial software, the DFARS allows the government and contractors to negotiate around this rule. The DFARS provides that the government and contractor may “[n]egotiate specific licenses when the parties agree to modify the standard license rights granted to the government or when the government wants to obtain rights in computer software in which it does not have rights.” In considering this course of action, the DFARS instructs the government to consider many factors, including “the planned software maintenance philosophy, anticipated time or user sharing requirements, and other factors which may have relevance for a particular procurement.”

C. Special Works

One final data rights allocation regime is that of “special works.” The government always has unlimited Rights in Special Works. The FAR defines “special works” as any “production or compilation of data (other than limited rights data or restricted computer software) for the government’s use,” including audiovisual works, movies, reports, books, studies, surveys, and government histories. In other words, a special work is any creative or literary work made under a contract with the government for the government’s use. The FAR includes computer software in its definition of “special works,” provided that the software in question may provide “a commercial advantage,” or is “agency mission sensitive, and could prejudice agency mission, programs, or follow-on acquisitions.” For its part, the DFARS provides a non-exhaustive list of the types of works that qualify as special works. While the list includes software documentation and computer databases, it does not include noncommercial computer software.

D. Applying DFARS Data Rights Law to Machine Learning AI

Having defined the contours of current data rights law under the DFARS, it is now possible to consider how they would apply to data rights allocation in a machine learning AI. The traditional data rights regime (i.e., unlimited rights, government purpose rights, and restricted rights) is inadequate. Consider the following scenario: assume that the Air Force requires an advanced AI system capable of autonomously piloting a crewless combat air vehicle (UCAV). The Air Force does not particularly care how the AI is programmed; it requires only that the AI pilot the aircraft autonomously and make autonomous combat decisions in a contested aerial environment. The Air Force publishes a Request for Proposals (RFP) indicating that it will award a contract to the contractor that presents an optimal combination of cost, technical, and other factors. Assume also that the winning proposal for this notional AI pilot is from a company that proposes to provide a pre-written machine learning AI kernel that the Air Force can load into a UCAV simulator (and later, a real UCAV), with the AI slowly teaching itself how to fly and fight. Finally, assume that after the project is proven an overwhelming success—providing, like the Navy’s X-47B, superhuman piloting skills—the Air Force wishes to use the AI for other piloting applications.

The problem in this scenario is determining who owns what rights in this profoundly successful, highly valuable AI. Considering the cost of training a human pilot to operational competency ($3 million to $11 million, depending on the aircraft) and the cost of replicating software (almost nothing), the answer to this critical question would be worth many millions of dollars. Indeed, if the AI were repurposed, tweaked, and fed enough data to become a general AI, it could become the most valuable intellectual property in world history. The question of ownership here is vital.

Under the DFARS, who owns the intellectual property rights depends on who paid to develop the software. The DFARS defines software as being “developed” when it “has been successfully operated in a computer and tested to the extent sufficient to demonstrate to reasonable persons skilled in the art that the program can reasonably be expected to perform its intended purpose.” If the software is developed solely at private expense, the government will obtain only restricted rights. If the software is developed at government expense (i.e., via costs charged to a government contract), the government will obtain unlimited rights. Finally, if the software’s development is partially done at private expense and partially done at government expense, the government will obtain government purpose rights.

Recall that the contractor provided the government with a simple AI kernel at the start of the contract. The program “knew” only two things: it “wanted” to fly, and it knew how to reprogram itself. It did not know how to fly a UCAV; it only learned how to do that when the government plugged it into a UCAV and let it start learning to fly. Even so, the contractor might argue that for purposes of data rights, the “intended purpose” of its AI was not to fly a UCAV; the “intended purpose” of the AI was to learn how to accomplish whatever task it was programmed with. If the contractor could convince a judge of this, the contractor would get full rights to the new, mature version of the AI that did know how to fly a UCAV, and the government would be left with relatively worthless restricted rights. This is because although the government presumably expended plenty of its own money in teaching the UCAV AI to fly, the “development” of the original program was solely at private expense. Although the government paid for the AI kernel, the contractor would argue that the kernel was already developed before the contract began and was therefore not allocated as a cost to a government contract. Thus, the fully mature (and highly valuable) AI still belonged to the contractor.

On the other hand, the government might argue that such a result would be unreasonable given the total costs of developing the mature version of the AI. The government would argue that the base AI kernel, while perhaps cleverly designed, was comparatively worthless. It was not until the machine learning AI taught itself to fly and fight in a UCAV that it became worth anything. The government would argue that the government, and not the contractor, was the party that fed the necessary data to the AI to teach it to fly government-owned or operated aircraft—and considering the costs of flying, that raw data did not come cheap. The government would therefore argue that it was entitled to unlimited rights in the mature AI.

If the government were to prevail in the above scenario and obtain unlimited rights in the mature AI, it would likely stifle competition. What software company would consider investing millions of dollars into research and development to develop a cutting-edge AI machine learning kernel so that the government could effectively declare the software as “freeware” or “shareware,” providing it to the developer’s competitors at little or no cost? Stated bluntly, if AI developers cannot have assurances that they will receive a reasonable return on their investment, they will not contract with the government at all.

Finally, consider a third possibility. Rather than find solely for the government or the contractor, a judge could find that the AI was developed with “mixed funding,” which entitles the government only to government purpose rights. This, too, is an unsatisfactory outcome for AI developers. As noted above, government purpose rights last for five years, and then the government obtains unlimited rights. This outcome would not encourage software developers to contract with the government, as it only delays the inevitable loss of its competitive posture—its unique IP—to its direct competitors. Suppose AI developers refuse to do business with the government. In that case, it will be an enormous loss to national security, since whatever world power dominates in the AI field could very well “become ruler of the world.”

The scenario above highlights the fundamental problem that machine learning AI poses to the DFARS as it is currently written: the DFARS cannot answer the question of how IP is allocated between the government and the contractor when a machine learning AI kernel, written by humans outside a contract using private funding, is licensed to the government, and then starts modifying and updating itself. The reason for this ambiguity is simple: the DFARS assumes that (1) the purpose of software is obvious; and (2) software is developed by humans. The drafters of the DFARS did not anticipate the creation of a new breed of software that would partially write itself. Accordingly, as will be argued in section III.A below, to capture the world-changing reality of machine learning AI, DFARS 252.227-7203 should be rewritten.

E. Other Forms of IP Protection

Subsection D showed that, as it is currently written, the DFARS is unclear about how data rights in a machine learning AI are allocated between contractors and the government. Other forms of IP law are equally unclear on this question.

As for patent protection, the Supreme Court has consistently declined to grant patent protection to software, seeing software as “abstract ideas” that are not patentable. While the Supreme Court has not laid down an unequivocal rule that software is never patentable, the federal courts have made software patents hard to obtain following the Supreme Court’s reasoning. Indeed, after the Supreme Court decided Alice Corp., the Federal Circuit has invalidated over ninety percent of software patents that were litigated before it. Finally, while there is no data on the patentability of the emerging technology of machine learning AI at this time, there is no reason to believe that a creator of a machine learning AI would fare better than other types of programmers. In fact, machine learning AI may be less patentable than other types of software. First, the Constitution grants patent protections only to the “inventor” of an invention, and arguably, because it reprograms itself, machine learning AI is its own inventor. Second, while one might successfully argue that a programmer of the original machine learning AI kernel was the inventor of that kernel, the programmer would then most likely lose a bid for a patent on the fully mature AI on the grounds of lack of novelty or obviousness. As patent attorney Robert Sachs observed, “the more advanced the software technology—the more it takes over the cognitive work once done exclusively by humans, the more seamless it becomes in the fabric of our daily lives—the less patent eligible it is deemed to be by the courts and the USPTO.” Because of the difficulty of obtaining software patents, creators of machine learning AI might turn first to copyright protection to defend their IP rights.

Copyright law, however, offers little help or hope to AI developers. Admittedly, the federal courts have consistently held that software is entitled to copyright protection, viewing software as a type of “literary creation.” Even so, the copyright clause of the Constitution grants authors“exclusive right to their respective writings.” But because machine learning AIs reprogram themselves to accomplish their given tasks, a mature machine learning AI is essentially its own author. Furthermore, while it is true that a programmer who created a machine learning AI kernel could easily obtain copyright protection in the lines of software code comprising the kernel, this copyright would not be particularly valuable. After all, with machine learning AI, it is not the “dumb” kernel that is valuable, but the fully developed AI that can solve its given task better and faster than humans. The inadequacy of copyright leaves creators of machine learning AI with only one hope of IP protection: trade secret.

Unfortunately, while trade secrets would protect most creators of machine learning AI, those protections would be annihilated if they contracted with the government. Under trade secret law, the owner of a trade secret will lose IP protection if he publicizes his creation or fails to take reasonable precautions to prevent it from being disclosed. This is problematic for government contractors because of the Bayh-Dole Act, which provides that for “inventions” either created or reduced to practice during a contract with the U.S. government, the inventor will retain title to the invention. Still, the government will have “march-in rights,” which is the power to compel the inventor to grant a license to a “responsible applicant” where the inventor “is not expected to take within a reasonable time, effective steps to achieve practical application of the subject invention in such field of use.” Under Bayh-Dole, an “invention” refers not merely to inventions that are ultimately patentable but “any invention or discovery which is or may be patentable or otherwise protectable under this title.” Bayh-Dole is in Title 35 of the U.S. Code, while copyright is under Title 17. Nevertheless, at least one commentator has argued that “inventions” made during government contracts are subject to Bayh-Dole, so long as they are possibly patentable. As noted above, software is at least sometimes patentable. As a result, it appears likely that since it could qualify for patent protection, a machine learning AI developed for or during a DoD contract would fall under the mandates of Bayh-Dole, meaning that the software developer would retain title to the software, subject to the government’s march-in rights.

While it is true that the government has yet to force anyone to grant march-in rights, this is likely because few qualified applicants are asking for licenses. If a DoD contractor comes up with an advanced machine learning AI, this trend would likely change. A software developer who invested heavily into creating a unique program would find itself facing down dozens or hundreds of requests for licenses. Even worse, if a contractor would prefer to keep the technology a trade secret, Bayh-Dole would still compel the contractor to seek protection, so long as the technology “may be patentable or otherwise protectable” under Title 35. Thus, as a practical matter, every government contractor who develops software during a government contract will be required, whether they like it or not, to seek IP protection in such a way as to publicly disclose their creations, destroying any chance of keeping their discovery a trade secret. Thus, trade secret law, like patent and copyright law, will not provide sufficient IP protection to creators of machine learning AIs contracting with the government.

III. Recommendations

This article has demonstrated that as they are currently written, the data rights provisions of the DFARS cannot solve the problem of how to allocate data rights in machine learning AI between the government and contractors. This article also has shown that traditional forms of IP law (patent, copyright, and trade secret) would provide little protection to AI developers contracting with the government. The next question, then, is what is to be done to remedy this situation. Three recommendations are provided below.

A. The Data Rights Provisions of the DFARS Should Be Revised

As with many legal issues of the modern information age, the fundamental problem with the data rights provisions of the DFARS is that the law has yet to catch up with the technology. When the drafters of the DFARS were writing out how to allocate data rights in noncommercial software, they could not have expected that one day programmers would invent software that was programmed to reprogram itself.

The core problem is simple: the DFARS requires an update to what it means to “develop” software. Today, software is considered “developed” once it can accomplish its “intended purpose.” For most types of software, this simple definition is adequate. For example, the intended purpose of database software is to store and manipulate data sets, the intended purpose of a web browser is to browse the internet, and so on. That said, this simple definition of software development is problematic when applied to machine learning AI because there are multiple plausible arguments for what the “intended purpose” of a machine learning AI is. Is the “intended purpose” of the software merely to learn? Or does its “intended purpose” include working at a particular application? The answer is ambiguous, and for the sake of clarity for both the government and contractors (and to prevent extremely high-stakes litigation), this definition needs to be clarified for machine learning AIs. The language change could be as simple as defining “intended purpose” as “the functional goal as specified in the solicitation.” In this way, the ambiguity over the intended purpose of a machine learning AI is resolved by whatever definition the contracting officer provides in the solicitation. This would simultaneously give clarity to would-be offerors while also leaving flexibility in the DFARS to enable contracting officers to specify whichever purpose they wish for AIs. To use the UCAV hypothetical from above, the solicitation could provide that the “intended purpose” of the machine learning AI would be to pilot the UCAV autonomously. Alternatively, suppose the government wanted to put offerors on notice that it intended to use the AI to potentially pilot other aircrafts in the future (and therefore have offerors bake that assumption into their proposals, leading to a higher price). In that case, it could provide that the “intended purpose” of the AI was piloting generally.

B. Contracting Officers Should Employ Other Transaction Authority

Congress has granted the DoD a gift: Other Transaction Authority (OTA). While OTAs were traditionally a tool for NASA to deal with high technology companies, the DoD can now use them. In fact, in recent years, Congress has expressed frustration that the DoD does not use OTAs more often. OTAs are “catnip for technology companies,” in that they provide “the promise of a streamlined contracting process with built-in flexibilities to give them maximum protection of preexisting IP and the benefits of IP produced under the OTA.” OTAs are a perfect contracting tool for emerging technology like machine learning AI: they are not bound by the DFARS’ strict and outdated rules, and, unlike with standard procurement contracts, contracting officers have great flexibility in drafting OTAs. Indeed, although there is some confusion on this point, when using OTAs, the DoD is not even bound by the restrictions of the Technology Investment Agreement (TIA) framework for allocation of IP rights. Instead, contracting officers utilizing OTAs are free to customize the IP rights allocations in their agreements with contractors, subject only to a few best practice suggestions. That said, given this flexibility, contracting officers using OTAs (and their counsel) should carefully consider the likely future direction of any machine learning AI technology when deciding how to split the IP pie with contractors.

C. Contracting Officers Should Use the DFARS Flexibility Already Available

Aside from the flexibility that OTAs offer, contracting officers should use the flexibility granted by the DFARS, such as special works and Specially Negotiated Rights.

As for special works, the DFARS does not include computer software in its list of examples of special works. Still, the DFARS provides that the list of examples is not exhaustive, and the FAR does include software as a type of special work. Furthermore, as explained above, software is copyrightable. This is important because, as with the Air Force UCAV scenario above, it is likely that the DoD will not only have AI development as a part of contracts but may have the provision of AI as the only line item in a contract. If the AI is the “work” that the contractor is providing, the “special works” provision of the DFARS seems appropriate to govern contracts for AI. Like a book or other creative work commissioned specifically for the government, the development of a machine learning AI could be “commissioned” by the government as a special work. This approach would have a drawback: using the special works provision would mean that the government would receive unlimited data rights. However, if contractors were made aware at the solicitation stage that they would lose control over both their AI kernel and the fully mature AI if they contract with the government, they could price that loss into their bids. From the developer’s perspective, in these “bet the company” scenarios, selling their IP is not by itself a bad thing; they just need certainty to know what they are doing when dealing with the government.

As for Specially Negotiated Rights, as noted above, the DFARS gives contracting officers flexibility to specially negotiate data rights in noncommercial software where the government or the contractor do not like the standard provisions of the DFARS. Data rights in AI is a perfect candidate for these kinds of creative negotiations. For example, instead of the typical “mixed funding” scenario in which the government has government purpose rights for five years followed by unlimited rights forever (a frightening prospect to a “bet the company” AI contractor), the government and the contractor could negotiate for a longer period of government purpose rights, without a follow-on period of unlimited rights in perpetuity. Assuming the original kernel was sufficiently flexible, an extended period of government purpose rights could prove revolutionary. Instead of the contractor giving the government an AI kernel to fly a UCAV, the contractor could work with the Air Force to first teach the AI to fly a UCAV, then teach it to conduct SIGINT data analysis, then teach it to help Air Force physicians with difficult diagnoses, and so on. With enough data fed to it, such an AI could develop into a superintelligent general AI, far smarter and faster than any human mind. The Air Force could call this AI “Skynet.”

IV. Conclusion

Whether it wants to or not, the United States is now entering into an AI arms race. Machine learning AIs will provide unparalleled supremacy in weapons system control, SIGINT, UAV piloting, and more. The DoD cannot create these systems itself, and it cannot attract the programmers who can create them without ensuring those programmers have some level of certainty as to who will own their creations. Considering these facts, updating the data rights provisions of the DFARS to capture the reality of machine learning AI is not merely a good idea; it is a national security imperative. After all, if one nation will rule the world through AI dominance, it may as well be the United States.