Metadata: The Good, the Bad, and the Misunderstood

Vol. 30 No. 2

By

Donna Payne (donnapayne@payneconsulting.com) is CEO of PayneGroup, Inc., a consultancy providing software, training, and publications to governments and corporations.

Metadata headlines have made a resurgence in recent months. One headline read, “General Petraeus Brought Down by Metadata.” And, of course, there was the article about the software king turned fugitive, John McAfee, whose location in Central America was easily pinpointed after he sent pictures of himself while on the lam.

Although metadata is now a common word in our vocabulary, it is arguably overhyped. Metadata was branded with a scarlet letter primarily because we all heard about people inadvertently leaving tracked changes or comments in their documents. Or, according to a judge with whom I had a conversation over lunch, because the issue garnered so much attention that the courts (and bar associations) were forced to issue rulings without fully understanding the subject.

Ultimately, metadata is not bad—it’s just misunderstood. Metadata is absolutely essential in making a file operable; without it, we would not know where on our computer a file is stored, its filename, or other necessary information.

Metadata can be found in all software applications, and although this article focuses on Microsoft Office products, it’s found in WordPerfect, PDF, and image and video files that you create with a GPS-enabled device such as a smartphone, just to name a few.

What Is Electronic Metadata?

You’ve probably at one time or another heard the definition of metadata: “data about data.” Could anything be more vague or generic? Well, I’ll let you in on a dirty little secret. Back in the early days of working with metadata when I didn’t really understand it to the degree I do today, I fell back on using that definition, too. No shame in that—it is factual. But it’s an incomplete explanation.

Metadata is information stored on your computer, your network, or in documents and other types of files. Sometimes it is buried so deep that you need a forensics tool to uncover it. Other times finding metadata can be as simple as looking at the file properties to see system- and user-generated data about when the file was created, last accessed, if it was printed, the amount of time spent editing it, and, in some cases, what document management system or template package was used with it. Metadata is an electronic fingerprint that can be presented as evidence. It can embarrass a document author or unearth secret e-mail correspondence. In rare instances it has been used to catch serial killers. More often, it has been the catalyst for change and improvements in technology.

Your Personal Metadata

One of the easiest types of metadata to understand is user-added metadata—for example, those pesky tracked changes and comments that have made headlines and caused extensive embarrassment. To see it for yourself, just create a new document and type in a paragraph of text (or copy the text from an existing document). Now turn on tracked changes. To do this in Word 2007 and later versions, just select the Review tab and click the Track Changes button. Now make some changes to the text: a few additions and deletions, maybe some formatting changes.

All your changes are readily apparent—additions underlined, deletions struck through, etc.—so you might be wondering where’s the potential for danger. But what if you modify how your edits are displayed on the screen? To do this, go back to the Review tab and select Final from the Display for Review drop-down list. This option displays the document content as if all changes had been accepted (meaning you will no longer see what you had marked for deletion). Alternatively, you can change the Display for Review option to Original, which displays the original document as if no changes had been made (meaning you will no longer see any text you had added). Now imagine sending this document to someone who is unaware of the display change and who decides to send the file to someone outside your organization “as is.” All your hidden edits—the metadata—are just waiting to be found by the recipient.

If you really want a sense of the dangers this can create for unwary lawyers, try adding a Comment that reads, “The client is asking for five million but will settle for as low as two.” Believe it or not, someone actually typed this Comment into a document and then sent the document outside the firm to opposing counsel.

Such comments and tracked changes also contain an additional piece of metadata that can lead to problems: user name. In Microsoft Word, it’s possible to change the author name associated with tracked edits. Want to try it out? Go back to your sample document and choose File>Options>General (or, in Word 2003 or earlier, choose Tools>Options>General). Now change the user name to that of your co-worker. Recipients of this document now will see that the tracked changes were suggested by your colleague. Or opposing counsel. Or Barack Obama. Or anyone else you choose. Obviously, I am not recommending this, but I want to emphasize the possible danger.

Tracked changes are just one area where user-generated metadata can hide. And keep in mind that each new version of software may add new dangers. One example is the new “collapse headings” setting in Word 2013. You can actually collapse or expand headings, which can obfuscate the text under that section heading. Fortunately, Microsoft has a setting that forces these headings to expand when the file is opened; however, you must have activated this feature before sending the file. Click File and choose Options. Select Advanced in the left pane, and in the Show Document Content section check the option to Expand all headings when opening a document. You need to familiarize yourself with the unique challenges that each release of a new version of software presents before sharing any files created with the program.

Application or File-Level Metadata

Metadata that is added by the application is called application metadata. More often than not, this information travels with the file but may not always be displayed on the screen. Let me give you some examples.

Unless you’ve set a group policy so this information is not collected, the dates when a file was created, accessed, printed, and modified are automatically tracked and accessible to anyone who views the file. The same goes for the amount of time spent editing a document. These pieces of metadata are called “file properties.” And you don’t need any special software tools to view them. To see how it’s done, go to your sample document again and choose File>Info (or, in Word 2003 or earlier, choose File>Properties). The file properties will appear on the right side of the screen. If you use a document management system, you might not have access to the file properties—in which case, you can’t see what others are seeing when you send a file to them. But the information is there.

System Metadata

If you’re a litigator, you’ll want to be aware of the metadata in the Windows Registry. The Registry is important and extensive. Countless entries exist there about what files you’ve accessed and even viewed from the Internet and what software is currently or has ever been installed. Some of the information is viewable, but other values are encrypted. (Craig Ball wrote an excellent, comprehensive article on metadata from a litigator’s perspective entitled “MetadataGuide2011.pdf,” which is available from www.craigball.com.)

You could take the time to familiarize yourself with the Windows Registry or, better yet, work with someone who understands it. I advise working with a high-end Registry specialist; if the registry is edited improperly, it could render your computer inoperable.

Protecting Yourself

The first step in protecting yourself from the dangers of metadata is being aware that this type of information exists. You need to expend extra care when working with confidential files that will be sent to external parties. You should also invest in software to remove metadata from your files. This software is typically under $100, so it’s definitely affordable. There are a number of products that are available, such as Workshare Protect (www.workshare.com) and my own company’s Metadata Assistant (which was the first such product on the market; www.payneconsulting.com). For even more options, use an Internet search engine to look for metadata removal software. You should evaluate the products listed to find the one that’s right for you. Also, once you purchase a product, check back with the company to see if there are any new versions available. As new metadata types are added, these products are updated.

Using the program you chose, clean the metadata from the native file (such as Word, Excel, PowerPoint, etc.). Then convert the file to PDF. Although PDF files have some metadata, there is significantly less. You can view PDF properties from Adobe Acrobat by choosing File>Document Properties. Two pieces of metadata relevant to lawyers are PDF Producer and PDF Version. The PDF Producer field allows you to see which scanner in an organization was used to convert a file, allowing you to track who accessed the scanner at the time the file was made.

If you use an iPhone or a portable device that has location services and GPS for pictures, turn off this feature; otherwise, each picture you take and share with others can be traced back to the picture’s exact location. And if you have teenagers or young children, please share this piece of advice with them as well.

If you’ve redacted text in a document, ensure that it’s not visible when the file is scanned. I recently had to send a copy of my driver’s license for proof of identity. To ensure privacy, I used a black marker to redact all information except for my name and the state where the license was issued. When I reviewed the scanned document, I was able to see through the redaction marks to the underlying data. Keep in mind that modern scanners make redaction much more of a challenge.

I also recommend staying current on your local bar association rules for working with metadata. Remember that if a file is likely to become part of litigation, you’ll need to take precautions in order to avoid spoliation. The discovery process may also dictate whether or not metadata needs to remain in files supplied for document production.

The important thing to remember is that metadata is not a dirty word. Metadata is a necessary part of every file and computer system. It provides benefits, but along with the benefits there are risks involved. Ultimately, it’s your responsibility to know those risks and how to protect yourself from accidental disclosure of confidential information.

Advertisement

  • About GPSolo magazine

  • Subscriptions

  • More Information

  • Contact Us