As far as e-discovery current trends and topics go, big data analytics, machine learning, and cloud computing hold the industry spotlight. Metadata on the other hand, is mundane to many. However, the “data about data” that metadata provides deserves a fresh look. This information, lying underneath the viewable content of an electronic file, provides a wealth of useful material. Proper extraction and handling of metadata play a critical role in executing a speedy document review, satisfying strict regulatory agency production guidelines, and avoiding spoliation sanctions.
For e-discovery purposes, the term metadata generally refers to information about an electronic file (email message, MS Office document, audio/video file, etc.) that is stored in the underlying contents of the file. While some of this information may appear on the face of a document, such as the file date or file name, there can be hundreds of additional metadata values that are not readily accessible without the use of technology to extract them. Some metadata values are easily updated by a document’s custodian, such as the file author or file name for an MS Word document. Other metadata values, such as a file’s date last modified, are computer generated and not available for manual input or manipulation.
Metadata: To Serve and Protect
Metadata will protect and serve you if you collect and preserve it.
The very first step in utilizing metadata is ensuring up front that the documents are collected in a fashion that preserves the integrity of the metadata. Let’s say, for example, that you have a targeted collection of a custodian’s laptop. Performing a forensically sound collection using the appropriate hardware and software tools will safeguard metadata from spoliation. Alternatively, if you ask the custodian to “self-collect,” it is likely that the very action of moving and accessing the files will alter the underlying metadata—notably the date created or date last modified. Self-collectors also typically move files outside of their original folder locations, which can make it more challenging to relocate relevant documents after the collection has taken place.
Properly collected metadata will provide downstream benefits when it comes time for document review and production. First and foremost, in early negotiations between parties, the side with complete and well-preserved metadata has an upper hand when requesting in-turn receipt of the same from the opposing party. Anecdotally, in the past, parties have chosen to withhold production of metadata, opting instead to deliver paper or image (e.g., TIFF, PDF) files completely devoid of any metadata. This tactic is generally not acceptable anymore as the majority of documents exchanged today once existed natively in electronic format, and e-discovery software is widely available to extract and produce metadata. (Should you find yourself in a scenario where the producing party refuses to supply metadata, you might still be able to use optical character recognition (OCR) to pull out information from a document’s text for the purposes of searching and sorting documents.)
Generally speaking, there are about two dozen popular metadata fields with which you are likely familiar, including email to/from/cc/bcc; email date/time sent; email subject; email message id, author; file name; file size; file path; date created; date last modified; application; hash, etc. You may be surprised to hear that there are many more metadata fields available for extraction, such as date last printed or date last accessed. You can work with a well-versed e-discovery expert on a case-by-case basis to determine which metadata fields beyond the standards are useful and/or necessary to extract.
Making Metadata Work for You
The extracted metadata is typically loaded into a document review platform and displayed alongside the document from which it is sourced. From there, you have the ability to reap the benefits of metadata.
- Searching. Metadata fields are fully searchable and typically return results faster than a full text search. While unusual, there are instances where a relevant keyword term may be present in the metadata (e.g., file path) and not in the full text of the document itself. To cover your bases, search terms should be applied across full text and metadata.
- Sorting. Date and time metadata give you the ability to sort documents in a chronological fashion. Typically, the various date fields can be merged into one custom metadata field after extraction, which will allow you to sort loose documents together with email documents.
- Email threading. Email metadata allows for email organization by thread, which can cut down on document review time while also maintaining coding consistency.
- Deduplication. Metadata is necessary for the application of industry standard deduplication practices. Deduplication of electronic data is an absolute necessity when it comes to cutting down data volume and redundancy.
- Document integrity. Metadata can be the first red flag or line of defense when questions of data spoliation arise. A document’s MD5 hash value will not change between the time of collection and production unless the contents of the native file have been altered. A savvy attorney will also be able to observe any inconsistencies between the metadata and the contents of the document that may warrant further investigation.
- Communication analysis. Many document review platforms have helpful built-in visualization features to help you better understand your document universe. An example of this is a communication analysis tool. Utilizing the email header metadata, the software will display dynamic “heat maps” showing timelines and patterns of frequent communications amongst custodians.
- Privilege log automation. Most document review platforms allow for easy export of metadata fields into spreadsheet format. This means that the majority of the privilege log fields—such as email author, recipients, subject, and date/time—are instantly populated without manual cutting and pasting.
Before You Produce . . .
During document production, there are some special considerations involving metadata. Presumably, both parties have agreed upon the fields to be exchanged and the format of the metadata files that will be provided (e.g., ASCII vs. Unicode, .dat vs. .csv). Regulatory agencies are particularly stringent in their requirements for production of metadata. An example of the U.S. Department of Justice’s requirements can be found in an attachment to a joint case management statement and (proposed) order for United States v. eBay, No. 12-cv-05869-EJD (N.D. Cal. 2013), with the relevant metadata section on pages 9–11.
Surprisingly, metadata considerations are often forgotten during the production of documents containing redactions. It is critical to ensure that any language redacted from the document’s image and subsequent OCR text is also redacted from the metadata file.
The widespread use of new and emerging technology in daily life—especially the “Internet of Things” (IoT)—is already impacting the types of information subject to discovery. As technology evolves, it is clear that metadata deserves special consideration. In order to gain a complete understanding of what story your data collection tells, remember that there is more than meets the eye.
Diane Quick is a director in the Legal Technology Solutions section of the Disputes and Investigations Practice of Navigant in New York.
Navigant Consulting is the Litigation Advisory Services Sponsor of the ABA Section of Litigation. This article should be not construed as an endorsement by the ABA or ABA Entities.
Copyright © 2017, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).