What and Where Is My “Data”?

David Michael

Data this and data that. We hear the word used in many contexts. What exactly is data? Techies everywhere are often preoccupied with digital data. But data can be analog, too. Data can also be duplicated and redundant. Data can expire. Data can be corrupted, stolen, breached, and lost. If all this can happen to data, we should know what data is, where to find it, and how to store it.

Lawyers, in particular, must understand data concepts, terms, and best practices to advise and protect their clients—and themselves. There are three reasons to know what and where your data is: first, to structure it correctly; second, to be able to quickly and completely retrieve it; and third, to know when to destroy it. These reasons are equally good for your clients and your law firm.

What Data Is

The essence of “data” is that it is recorded. A live production, deposition, or conversation is not data unless it is recorded. If recorded, it then can take many forms: It can be audio, visual, alphabetic, or numeric. Most things we call data are combinations of visual alphanumeric content that communicates or entertains. Some data is “inanimate”—it does not change. Imagine a text document file; even if it is translated from a Microsoft Word document into a portable document format (PDF) file, its content is fixed. Other data, such as the balance on a client trust account, is subject to change. These figures should nevertheless have some data reliability and should not change without a known source and predictable impact. As a “record” of something, data has characteristics, value, and context, and it needs to be managed according to rules and laws.

Types of Data

Content data vs. metadata. Content data is the recorded communication itself, whereas metadata is the information about that content. Metadata might include the author, date of creation, location where the file is stored, and who owns it. Documents that are “inherited” from other documents, such as a letter based on the word-processing file of a letter you previously had drafted for a different client, also inherit some metadata. In this way you could expose the name of one client to another. Data may be “tagged” with a variety of metadata tags that will increase your ability to categorize and organize the content without looking at the content.

Structured vs. unstructured. In a small business context, the term “structure” is used to describe documents and other data that are managed rather than unmanaged. When word-processing documents, PDFs, e-mails, and other digital data are stored in folders that are appropriately labeled, indexed, and searchable, I call this structured data. On the other hand, if your e-mail in-box has several hundred e-mails and there is no distinction between advertising messages, listserve threads, and client e-mails, then your e-mail is unstructured. Structured data is also the data in practice management and billing programs that list clients, matters, documents, phone calls, notes, billing items, invoices, payments, and accounting details. If your day-to-day documents are stored on your desktop or in the “My Documents” folder, then, in my opinion, that data is unstructured. I suggest that client documents should be structured in file folders according to client name, matter number, and document type. Matter names are not always unique, and clients may have more than one matter; this structure accommodates those possibilities.

Digital vs. analog. I’ll admit, this distinction is a bit of a misnomer. When you think “analog,” you may imagine a broadcast signal from the days of TV antennae. Here I am distinguishing between printed and electronic documents. Analog data is stored on paper; digital data is not. The electronic data may change formats, from a Microsoft Word file to a PDF file or from the figures tracked in your billing system to a PDF report, but this data is still electronic. Digital data could once have been on paper, then scanned and stored in digital form. If you have digital data, then generally you need not keep the original—unless it is a legal document such as a will or a deed, where the original signatures, raised seals, notary stamps, and other characteristics make the hard copy more legitimate than a scanned document file. The desire to “go paperless” drives an increase in digital data and thereby increases the need for managing that data. It is fine to scan in all your firm bills, but you must store this data in a clearly labeled folder (e.g., “2016 Firm Expenses”). Within this folder you might have separate folders for fixed costs and variable costs or some other breakdown depending on your tolerance or need for more control. Digital data should be rendered searchable. You would think this is a given, but most PDF documents are not searchable unless someone has indexed them via optical character recognition (OCR). Analog data must also be managed; whenever practical, printed documents should be stored in physical folders in file sections defined by year. A file section grouping together all firm expenses for 2010, for example, may typically be destroyed on the same retention schedule as your taxes for that year. (Consult your tax professional for specifics.)

Primary vs. secondary. It is important not to mix primary and secondary data, as telling them apart can be difficult. Simply put, secondary data is not the original. Think of relevant financial paragraphs copied and pasted from a long letter and saved as a separate document. If primary sources of data are well secured, you can work with secondary files, but then when done, the secondary data may be destroyed according to the same time frame as the primary data. I often focus on process analysis. I will hold up a piece of paper and ask, “Where did this come from, and where is it going?” Ask yourself these questions when you are working with either client or firm data. Data context actually matters, particularly for secondary data. An e-mail copied and pasted from the middle of a thread and then saved as a word document has lost context for the remarks made within that e-mail thread. It also has lost any credibility as an original or trustworthy source of data, as it is no longer in the original format. This data is probably not admissible as evidence. To ensure your client data maintains admissibility, consult a forensic computer specialist. They will make a static copy of all data on a workstation and be able to “prove” that everything has been maintained in its original format.

Where to Find Data

There are three categories of where to find data for a client or firm: on premises, on computers, and in the cloud.

On premises. Your firm or client will have some file cabinets of shelves with file folders. This is a good place to conduct a complete record inventory. If you are responsible for all this data, no location should be ignored. Open every drawer, look in every box, write it all down, and put it in an Excel spreadsheet. If clients come in for their file and you give them only one box out of three, there will be a complaint to answer. Your own firm data is equally important. Articles of incorporation, leases, promissory notes, whatever legal documents apply to your firm, these must be easily found. While you are at it, consider scanning all these originals and storing them in structured folders on a removable hard drive off-site.

Client folders, if stored by year and file number, are easier to find and file than when shelved by client or matter name. If the outside of the client folder also includes a file type, then you can easily identify when it may be destroyed—a traffic case has a different retention schedule than an estate-planning case.

On computers. Digital data resides on computers or in the cloud. A record destruction schedule will help you keep the volume of data in the cloud down to the minimum, saving you money. Look for your data on your computer or server using Windows Explorer, or, if you use Macs, the Finder. See if you have C:, D:, E:, G:, etc., drives. Each of these drives on the server probably has different data. Your IT firm may have set aside D: for SQL data and E: for backups, etc. This is a good practice even for solos or small firms. These other drives might be external hard drives, usually used for backups, or they might be partitions of the root drive C:.

On a server, C: is rarely for data; this should just be programs. On a desktop however, the C: drive can be the source for many digital documents. In Windows, you can search for all the .pdf or .docx documents on a particular computer by typing *.pdf or *.docx in the search field of Windows Explorer.

If more than one person signs on to the same Windows computer, each will have a folder in a path something like C:Usersdmichael, and each user will have a desktop folder, downloads folder, and documents folder. You can get to all these folders from any login, unless the administrator has restricted the access.

In the cloud. You can find data for a client or firm in many cloud locations. Like an on-premises solution, a cloud solution will have programs, folders, and unstructured data. One common source of data for a firm or client might be Dropbox. If Dropbox is installed on a desktop, you will find these documents in C:UsersUserNameDropbox. The problem with Dropbox is that you have different documents for different users, and many are shared. In theory they are kept in sync, but the individual documents are in sync, not necessarily the file folders. If you must get your data or a client’s data out of the cloud, the provider will copy everything to an external hard drive and send it you. If the data in the cloud was stored in a program, then you must reinstall the program to access the data or extract it from the source files.

What to Do with Data You Find

If you are working with a client, you must first protect the data the client has and find someone to create a forensic copy of all digital data. This might be on every desktop, every phone or tablet, and on the server. From there analysis and e-discovery are performed so that the original is unadulterated.

No client documents should be destroyed without having a retention schedule in place. If data is destroyed after a client is engaged in a lawsuit, you know the client is in big trouble. You will want to consider the record retention rules for your state and then draft a record destruction schedule for your own firm. Your clients should do this now, before they are sued. Formalize that policy and delete everything that already meets the schedule. Then, every year going forward, delete the data that can be deleted. (View a sample law firm records retention schedule.)

Getting Structured, Staying Structured

I hope this short conversation about data has inspired you to structure your files. No more haphazard storage in the “My Documents” folder or the desktop for your data, no sir. From now on, your data will reside in nice folders, well described, logically stored and organized. Not everything can be “paperless,” but you can scan and index almost everything and make it searchable and findable. Name everything well, and save this digital data in well-named folders. For more ideas on data and organizational development, read my blog at

David Michael

David Michael owns and operates Michael Matters, Inc., providers of online and on-site support for technology solutions that increase clarity, responsibility, and productivity in law firms and legal agencies. He is also a Tedx speaker and the principal philanthropist at the OMI-Network.