Metadata Management in Microsoft Office: How Firms Can Protect Themselves against Unintentional Disclosure and Misuse of Metadata
By Randy Farrar and Susan McClellan
The steady growth of electronic document exchange has intensified awareness that Microsoft Office files include metadata beyond their printable content. Unintentional disclosure can be awkward or even raise malpractice concerns. Although metadata has been used to identify, classify, and manage documents in the legal environment for many years, most lawyers still lack a good understanding of metadata management. Although this article’s intention is not to provide a comprehensive “how to” guide on metadata, you will come away with a better sense of what metadata is, how it can be misused and overlooked, and what you can do to proactively control and manage it.
As we all know, Microsoft Word includes many automated features to aid in document production and collaboration. Unfortunately, these automated features can embed electronic information used to reveal the identity of those who edited the document (revision authors); and track the time, date, and frequency of edits (track changes and revisions), commentary (inserted comments), the document template (a unique firm identifier), and other data employed to control the document’s text and format. There’s even an option in Microsoft Word called “Fast Save” that, if selected, allows deleted text to remain as part of a document’s electronic file history (new text is appended). These are just a few of the hidden elements and document information found in a Microsoft Office document that make up the document’s metadata.
Many users often dupe-and-revise (using save as) to save time. When this occurs, the original author information, document properties, document variables, hidden text (forgotten), and last print date stay with the document. Much of this metadata can be seen by looking at the document properties or by opening the document using a text editor. If the document is being prepared for a client who is paying for its creation, then it is even more important that all the metadata is removed before it is shared with the client.
Tracked changes being left in a document are a common occurrence, which alerts many people to the dangers of metadata. When a document has been edited using a powerful collaboration feature in Microsoft Word called track changes, the changes stay with the document—even if they are not visible to the eye—unless those changes have been accepted. Turning off the track changes feature does not eliminate the changes tracked by the program. If you send the document to another user, whether a cooperator or an adversary, the recipient simply has to turn track changes on to see all the revisions of that document.
Comments, as with track changes, remain with a document, if not deleted. When the “Reviewing” choice is set to “Final” and not “Final Showing Markup”, then comments are invisible to the eye. If this document is shared outside the firm, the recipient can view the comments, which may contain embarrassing or even damaging information.
Metadata referred to here as “identifier metadata” can reveal the originator based on the metadata’s uniqueness to both the user and firm. Identifier metadata includes uniquely named styles, bookmarks, hidden document variables, and custom document properties. Identifier metadata, although not necessarily considered high risk, should to be managed if the originator needs to remain anonymous or if document creation strategy is revealed by the metadata trail.
Metadata “mismanagement” stories abound. Case in point: in 2004, a Microsoft Word document, produced as part of a lawsuit filed by SCO against DaimlerChrysler and AutoZone, revealed that SCO’s lawyers had also prepared a complaint against Bank of America. The document identified Bank of America as the defendant instead of the automaker. This revision and others in the document could clearly be seen through tracked changes. In another metadata disclosure blunder, the British government published a dossier on Iraq’s security and intelligence services without removing the related metadata. Upon further review, it was discovered that much of the text was plagiarized directly from a U.S researcher whose work was published on the Internet. To add insult to injury, the report also revealed a list of the dossier’s last ten authors and their edits and commentary.
Key Strategies for Metadata Control
As the legal community becomes more aware of metadata and the damage unintentional disclosure of document information can cause, the need to establish metadata control strategies and parameters becomes increasingly evident. Here are three recommended approaches worth considering:
1. Educate your staff about metadata concerns. Understand features that embed metadata (i.e., track changes) as well as the control and ramifications of these features. You can eliminate much of the metadata inherited from the “dupe and revise” practice by using firm templates to create new clean documents that have minimal metadata. You can also find powerful template and automation packages on the market now that, in many instances, work better and faster than the standard dupe-revise approach. These packages also provide tools the help you copy text from one document into another without the inclusion of hidden metadata in the “copied” text.
2. Control and manage metadata via third-party metadata scrubbing and management software. Microsoft provides a metadata removal tool for Microsoft Word. Unfortunately, its rudimentary approach doesn’t catch outgoing email attachments, and scrubs only a limited number of metadata elements. A more powerful third-party metadata application not only scrubs metadata but also allows a firm to manage the metadata at a very detailed level. For instance, you may want to keep track changes in a document, but eliminate the author and editing time information. Or a firm may want to ensure the user’s name is never left in a document, but rather the firm’s name is used instead. Always use a metadata removal application that publishes a clean copy of the document before it is shared electronically outside the firm.
3. Establish a firm-wide metadata scrubbing and management standard. Establishing metadata-related policies and procedures eliminates the need for individual users at your firm to decide what metadata gets scrubbed and makes the scrubbing process more efficient. This step is very important and should involve key users, especially lawyers. The firm metadata standard can be set up in levels that reflect what metadata gets scrubbed or changed (managed). For instance, a cooperator level might include most document properties but leave author information for collaboration purposes. An adversary level scrub might remove all metadata including turning all field codes to text and then converting the scrubbed document into a PDF for added metadata protection.
In conclusion, metadata in Microsoft Office documents can pose serious business and ethical risks if left unmanaged or ignored. It is important to educate law firm users about metadata elements and risks and articulate a metadata strategy by considering the establishment of metadata standards or best practices.
As president/CEO and chief software architect at Esquire Innovations, Inc., Randall Farrar has pioneered the development and marketing of several software applications geared toward the legal market. His training background has given him a thorough working knowledge of the specific problems faced by the legal industry when it comes to document production. He has extensive knowledge of many Microsoft products and has been the project lead and developer on more than 100 legal migrations and upgrades.
Susan McClellan, Esquire Innovations’ director of marketing and operations, has been working in legal technology for seven years. She is responsible for marketing strategy, marketing execution, as well as Esquire’s sales efforts, new business initiatives, and oversight of the day-to-day office operations.