chevron-down Created with Sketch Beta.
February 20, 2024 Feature

International Verification Standards for Open Source Videos

Alexa Koenig

Sunlight and a faint breeze floated through the windows overlooking Lake Como as eighteen people positioned themselves around a rectangular set of tables in October 2017. The lightness of the morning contrasted with the weight of the subject matter they were there to discuss: how to bridge their various areas of practice to strengthen the future of digital investigations and improve the ability of international lawyers to bring the world’s highest-level perpetrators to account. Even more specifically, they were tasked with determining whether international guidelines were needed to help standardize the rapidly emerging field of digital open source investigations.

Each person in the room had been invited because of their particular perspective on digital evidence, especially video evidence, or their expertise in international criminal legal processes. One was a journalist who, years earlier, had found creative ways to identify evidence of atrocities on Twitter. Another had established an archive to preserve YouTube footage of possible international crimes in Syria. Still others had abundant experience investigating atrocities related to international crimes committed in the former Yugoslavia and Rwanda. NGO representatives included experts in training activists how to use smartphones to capture footage of atrocities. Two of those present had previously helped draft international protocols to aid the collection of other types of evidence. One lawyer who had worked with the International Criminal Court (ICC) was there because she had painstakingly analyzed how ICC judges were approaching digital data, while another—a trial attorney—was there to keep the discussion grounded in the practical realities of how he would need to eventually present any digital evidence in court.

While many of the participants had been working with video and other open source evidence for years, almost all agreed that online information was being created or collected to varying standards and thus varying degrees of usefulness. Although journalists and human rights researchers had been developing creative methods for finding and verifying the accuracy of online videos for close to a decade, the videos often were not being collected in ways that documented their chain of custody, or with notes that explained how an item might have been modified since collection and why. Much of what was being shared with courts was relatively useless for international trials, documenting, say, that a school had been destroyed or a civilian killed, but not the contextual cues, such as a street sign or surrounding architecture, that would be critical to locate the incident in geographic space. These contextual data could also be helpful for linking the incident to an alleged perpetrator, providing lead information by depicting people who could potentially serve as witnesses, or suggesting physical evidence that might still be available. Few had yet articulated the legal issues that could arise from the mass collection of digital data, ranging from the difficulty of meeting disclosure obligations given how hard it would be to know what was in a dataset when entire social media channels were scraped, to identifying who might qualify as a witness for purposes of explaining how an open source investigation had been conducted.

One of the first tasks thrown to those present was to clarify a handful of relevant terms. This seemed important to ensure everyone was talking about the same thing when they used a particular word or phrase. For example, OSINT—an abbreviation for Open Source Intelligence—had become a common reference for digital information pulled from public spaces on the internet. But as several pointed out, much of the information they were interested in would not be used for intelligence purposes and thus for decision-making, but as evidence, or perhaps as lead information. While the organizers had anticipated that it would take an hour or two to clarify definitions, it took nearly eight hours to reach a consensus for just five terms—proof in itself that standards might be needed to bridge the disparate communities beginning to do this work.

By the end of the three-day workshop, the conclusion was clear: A protocol was needed to (1) help the international justice community communicate more effectively within and across areas of practice and (2) strengthen the quality and coordination of information increasingly flowing from social media sites towards courts.

Three years later, in 2020, the United Nations Office of the High Commissioner for Human Rights and the Human Rights Center at the University of California, Berkeley, launched the resulting Berkeley Protocol on Digital Open Source Investigations. Originally, the launch was to have taken place in Nuremberg’s historic courtroom 600, but those plans had been derailed by the onset of a global pandemic. The shift to Zoom, however, proved fortuitous, allowing hundreds of additional people to be present as then-High Commissioner for Human Rights Michele Bachalet announced the protocol’s launch in its advance English version. In 2024, the Berkeley Protocol becomes official, when finally released in all of the languages of the UN.

In this article, I briefly describe those guidelines and others that have been developed to shape the handling of digital open source evidence in an international context. I also summarize new international verification standards for such information, with an emphasis on videos.

International Guidelines for Open Source Data

Each new communication technology has allowed for information to flow farther and faster from its source. Such technologies have ranged from wax tablets used to transmit news from household to household in ancient Pompeii, to the printing press in the fifteenth century, photographs and films during the nineteenth century, and radio and television in the twentieth century. The digital revolution made possible by the invention of transistors in the 1940s ushered in a new era: By the end of the twentieth century, internet use was ubiquitous and phones with cameras (first released in Japan in 1999 and internationally popularized with the launch of Apple’s iPhone in 2007) enabled the capture and transmission of visual information by anyone with access to that technology. During the first two decades of the twenty-first century, the scale of digital data online exploded. As just two examples, by 2022 more than 500 hours of video were being uploaded to YouTube every minute, while 34 million videos were being uploaded to TikTok each day.

In the early 2010s, journalists, gamers, activists, and others all began jumping on the justice bandwagon. Alerted to videos of grave crimes and abuses posted to social media in relation to the Arab Spring, they invented new methods for mining the internet for facts about those events and sharing that information with legal investigators or the public. These collective efforts would launch an increased awareness of the potential of digital open source information to strengthen the evidentiary foundations of cases.

According to the Berkeley Protocol, digital open source information is information that is publicly accessible on the internet, by either observation (e.g., you conduct a search and see what results), request (e.g., you submit your email address to get access to a website), or purchase (e.g., you pay a relatively nominal fee to access an article online). The definition excludes information that you need special status to access, such as law enforcement status or some kind of subpoena, or that you acquire by illegal means, such as unauthorized hacking. Information that qualifies can take a wide array of forms, from videos uploaded by members of the public, to the audio of police scanners, photographs shared by major media, PDFs of official reports issued by human rights organizations, and government databases and spreadsheets.

In addition to providing key definitions, the Berkeley Protocol establishes minimum standards for ethically and effectively finding and evaluating open source data. It outlines professional principles (the competencies investigators should have to conduct online research, at a minimum), methodological principles (which focus on how the work should be done; for example, by always engaging in a three-step verification process as described below), and ethical principles that outline how to responsibly do the work.

The protocol was drafted to be tool agnostic, meaning that no specific platforms or tools are referenced. This was done to “future proof” the protocol, given the relative speed with which digital tools rise and fall out of favor, or become otherwise obsolete.

Methodologically, the protocol also explains how digital investigations can be incorporated into traditional documentation and investigation life cycles. Perhaps most important are the pre-investigative steps, first of which is conducting a risk assessment—ideally one that is holistic in that it considers the digital, physical, and psychosocial risks that come with online research—and how to put a plan in place to mitigate those risks. Second is a digital landscape assessment, which includes identifying who has access to the internet and how they’re communicating online, as well as whose perspectives may be missing. Finally, online investigation planning includes what protocols will be deployed, what tools will be used, and who will do what and when. The protocol provides a series of templates to help investigators map all of this, such as templates for assessing the risks and opportunities of using digital tools, documenting key information identified during the digital landscape analysis, and more.

Finally, the protocol touches on publication possibilities. The open source investigative era has been marked by a wave of creativity in how to communicate digital data to various audiences. The largest potential audience is the general public: Visual explainers, tweet threads, and story maps have all become popular outputs to amplify investigative findings. A more targeted audience is composed of judges. For example, SITU Research, “an unconventional architecture practice” based in New York City, created a digital platform to help judges navigate the visual evidence (both closed and open source) for a case at the ICC. Charges in that case, Prosecutor v. Al Mahdi, centered on the alleged destruction of cultural heritage property in Timbuktu, Mali. By clicking on the name of one of the nine sites where buildings had been destroyed, judges could pull up all of the satellite imagery, photographs, and videos relevant to that location to better understand what had happened. They could also pull up satellite imagery that revealed how the nine locations were positioned in relation to each other. For a later case involving international crimes in Timbuktu, Prosecutor v. Al Hassan, SITU Research took the visual aids even further, digitally reconstructing the city and empowering digital “visitors” to explore the various sites remotely, turning corners to find (and play) the videos of specific incidents at the sites where they had been captured.

International Verification Standards for Open Source Data

The Berkeley Protocol recommends taking a three-step approach to verifying all digital open source data, including video. The first objective is to assess any technical information affiliated with the item. A technical analysis takes advantage of the fact that metadata is embedded in every digital item at the time of its creation—whether that item is a photo, video, Word document, PDF, spreadsheet, or otherwise—and often contains critical information about where, when, and via which device that item was created. Several tools are available to easily and quickly assess whether metadata is still attached.

If the metadata has been stripped (for example, if the video was posted to a social media site), the investigator can search for metadata in the contextual information around the item (for example, a date or timestamp from the social media platform to which it has been posted, or details provided in the comments, tags, or text accompanying the item). Of course, these technical data are not irrefutable; metadata can be spoofed and contextual information can be deliberately or unintentionally misleading. But these technical data can provide a possible hypothesis for where and when a video was created or can be triangulated with other information as a check on its veracity.

The second step is content analysis: looking within the item’s “four corners.” Is what you’ve been told about a video consistent with what you can see? Methods for testing the content include processes like reverse image searching stills pulled from a video to see if an image has appeared on the internet previously. Another would be geolocation. For example, if the video is allegedly of events that took place in San Francisco, California, does satellite imagery or drone footage or other videos or photos of the same alleged location show similar built or natural markers, like the same pattern of buildings, mountains, or trees? Is the flora or fauna what you’d expect to see? How about the clothes people are wearing? If there is audio, are accents or patterns of speech consistent with that region and the alleged identities of the people depicted? What was the weather that day in that part of the world? Are people dressed consistent with that weather?

Content analysis is where the majority of open source verification methods are aimed but also can be the most uncertain and/or dangerous step in the verification process, given its dependence on visual comparisons. Social science research has repeatedly shown that people’s ability to visually analyze information and their confidence that they have accurately assessed that information do not perfectly correlate: People are often less accurate than they think. A few different tactics can help minimize risk. For example, people can develop multiple working hypotheses about the item they’re investigating and then test those hypotheses against the facts they know and the other evidence they’ve collected. They can also use peer review to try to offset any biases or mistakes. Finally, it is important to have some sort of protocol for avoiding common visual traps, such as overanalyzing lighter parts of an image, information in the foreground, or faces.

The third step is source analysis: establishing who first posted the video and whether they are reliable for the information that has been shared. A related issue is identifying whether the source is human or machine generated; today, a significant portion of online activity is perpetrated by automated accounts known as “bots.”

All three strategies—technical, content, and source analysis—are critical to offset human and machine biases, as discussed in this issue by Devon LaBat and Jeff Kukucka, and to check for misinformation and disinformation, including inauthentic digital data, as discussed by Raquel Vazquez Llorente.

Once an item has been thoroughly analyzed—and ideally either verified or debunked—questions may arise as to admissibility. At the ICC, the admissibility standard is relatively loose. Anything that is relevant will likely be admitted. The primary barrier is whether the method of obtaining the item violates a provision of the Rome Statute or an internationally recognized human right, that is, whether the violation throws the reliability of the evidence into question or admitting it “would be antithetical to and . . . seriously damage the integrity of the proceedings.”

Legal Tools for Open Source Investigators

Digital technologies have radically affected how international investigations are conducted. This evolution is particularly notable for encouraging cross-institutional and cross-disciplinary collaborations. Tools like the Berkeley Protocol and the guidelines following in its wake are designed to help legal actors become better able to communicate with others, but also to help nonlegal actors understand how to make sometimes subtle shifts in the ways they work that can increase the value of what they produce, collect, or analyze for court purposes.

Ultimately, such guidance is critical to ensuring that new individuals can be introduced into the fact-finding ecosystem who have the skills needed to make this work as effective, efficient, and ethical as possible. These may include computer scientists, who can help investigators bring the deluge of digital data down to human scale by automating various aspects of the review process; specialists in artificial intelligence, who can deploy their talents to do everything from helping to detect synthetic videos (“deepfakes”) to automatically identifying and blurring graphic material to minimize investigators’ exposure to potentially harmful content; to architects like those at SITU research who are pioneering new ways to introduce visual materials into court.

Now that the Berkeley Protocol is available to facilitate collaboration and communication across organizations and sectors, several other tools are being developed to further support such investigations. Three are particularly noteworthy. The first consists of materials for judges that are being pulled together by the University of Essex in London with a consortium of civil society actors that discuss the reliability of evidence acquired through open source investigations. Second are guidelines that build off the Murad Code to help open source investigators adopt a victim-centered approach to documenting systematic and conflict-related sexual violence that are being developed by UC Berkeley’s Human Rights Center and the Institute for International Criminal Investigations, with guidance from an expert working group. The third is a template for creating a psychosocial security plan for online investigators.

On the horizon are new frontiers for legal practice that rely heavily on visual materials. These include virtual courtrooms, such as one recently held in the metaverse; virtual visits to alleged crime scenes; and the potential to use augmented reality when physically visiting the site of an atrocity. All of these possibilities raise new opportunities and new risks for justice and will require an ever-widening and more diverse community of practice.

Consistency and Training in Open Source Video Evidence

Numerous scholars have identified the strengths and vulnerabilities of using digital open source information—especially video content—as evidence. This includes a relative lack of consistency in how courts treat such evidence. Hopefully, guidelines, standards, and training will strengthen consistency. As law professor Jennifer Mnookin pointed out in an interview with media studies professor Dr. Sandra Ristovska, “law school, legal education, and the legal profession can be very focused on words. We should work harder to ensure that our students gain meaningful exposure to working with and understanding both numbers and images, both the interpretive challenges they raise and how to use them as effective tools for advocacy.” Given the growing diversity of individuals and organizations now involved in the documentation, preservation, analysis, and presentation of open source videos, that training must include, but also extend beyond, those in the legal profession in order to strengthen the foundations of justice.

    The material in all ABA publications is copyrighted and may be reprinted by permission only. Request reprint permission here.

    Alexa Koenig

    University of California, Berkeley School of Law

    Alexa Koenig, JD, PhD, is an adjunct professor at the University of California, Berkeley School of Law, where she is also co-faculty director of the Human Rights Center and co-founder of the center’s Investigations Lab (Berkeley). She is co-editor of Digital Witness: Using Open Source Information for Human Rights Investigation, Documentation and Accountability and co-author of Graphic: Trauma and Meaning in our Online Lives.