Volume 4, Number 1
November 2005

Table of Contents
Past Issues

Metadata: What You Don’t Know About Your Documents Can Hurt You

We are all guilty of it. You take an old document, cut and paste a few paragraphs, type in some new thoughts, save it with a new title, and email it out to a dozen people for their edits and suggestions. Pretty soon you are buried in multiple versions filled with hidden changes and comments from different authors. Sorting through these edits presents significant security risks that many companies are only beginning to realize.

Although huge steps have been made to increase the security of company information in recent years, hidden document information is often overlooked. Every time a Microsoft Word, PowerPoint, or Excel document is created and amended, invisible data tracking the author, document changes, editing time, and other document properties, or “metadata,” are added to the document. For example, that original document you saved over and sent out for edits could have contained the name of the original author, along with the date it was first created. New edits could also be added to this metadata and, if not properly stripped, can lead to some embarrassing, or potentially damaging, information leaking outside the office walls. Other metadata, called UNC paths, actually can provide hackers with blueprints to your entire corporate network!

Understand the Different Types of Document Metadata

Microsoft Word’s collaboration features, such as Comments and Track Changes, result in a significant amount of metadata being included in documents. Originally conceived to shed light on data, document metadata categorizes information to make it easier to track and find. When used properly, this metadata can be extremely helpful. It offers qualified readers the ability to view comments and previous edits that help build the most comprehensive document. But when used carelessly, metadata makes it easy for other people to find out details about the document and other privileged information that could harm the business, the individual handling the document, or the parties included in the document.

Metadata can be found in every Microsoft Office program, including PowerPoint, Excel, and Word. In all, there are about 20 different types of document metadata, including:

  • Track changes: Inserted or deleted text you thought was gone
  • Speaker notes
  • Hidden cells
  • Comments
  • Your name
  • Your initials
  • Your email address
  • Your company or organization’s name
  • The name of your computer
  • The name of the network server or hard disk on which you saved the document
  • Other file properties and summary information
  • The names of previous document authors
  • Document revisions
  • Document versions
  • Template information
  • Hidden text
  • Macros
  • Hyperlinks
  • Routing information
  • Nonvisible portions of embedded Object Linking and Embedding (OLE) objects

Why Remove Document Metadata?

Every time a document is created, metadata is automatically added to it. Some of the information stored in the document may also be confidential (i.e., previous versions or information that may have been rejected or accepted) and may also expose businesses and individuals to hidden risks when it is emailed to people outside the company. The problem is not that metadata is added to a document, but rather, it is often more difficult to remove the metadata once it has been added. And because this type of information travels with the document every time it is emailed to others, sensitive or confidential information may be transmitted unknowingly.

It is critical for every Microsoft Office user to understand how to eliminate the risk of document metadata. The following steps can ensure that documents that you send or share with others remain secure and confidential:

  • Establish an enterprisewide metadata policy and deploy an enterprisewide metadata removal application
  • Cleanse metadata before converting to PDF.
  • Distribute final published documents in a secure, metadata-cleansed PDF format
  • Consider sending documents in zip format with a zip password

Workshare, the leading provider of document integrity applications, has also developed products that enable users to get control of their documents’ metadata to protect themselves and their businesses. Workshare’s TRACE! product is a free document security tool that provides personal protection against information privacy and compliance violations in all documents. TRACE! can be run against any Microsoft Office document on a personal computer, company network, or on internal or external websites and can identify metadata risks. TRACE! is available for a free download at http://www.workshare.com/products/trace/.

Workshare also offers its Protect product, the industry leading document security application for Microsoft Office documents. Protect ensures organizations are secure from the embarrassment, financial risk, and liability of confidential information leaks—including metadata. Protect tracks hidden information in documents and ensures users can “clean” their documents before they email them, as they reside in a repository or before they go up on a website.

The program actually prevents users from accidentally sending out a document with hidden text, metadata, or track changes and comments embedded by completely removing this information from the document. In addition, Protect provides conversion to PDF ( and content filtering of sensitive and regulated visible information, but that’s a topic for a latter date!).

Ultimately, corporations today face risks from situations that are either under their direct control or from conditions that they might not even be aware of that put their corporation at potential harm. However, by taking the proper steps and applying the most comprehensive technology, these risks can be eliminated.

For more information about metadata and how to prevent document risks, log onto www.metadatarisk.org.

Ken Rutsky is the executive vice president of worldwide marketing for Workshare, the industry leading provider of document integrity software applications for professionals. Ken has more than 20 years of experience in engineering, sales and marketing roles at IBM, Intel, Netscape and several software start-ups.

 

 

 

Back to Top

< /