For example, consider a hypothetical wage-and-hour case involving employees who are called back to work during their meal periods. Counsel has asked that the length of each meal period be analyzed to determine how many “short” meal periods exist. Consider three ways to disseminate information relating to this analysis: (1) summary paragraph, (2) summary table, and (3) data visualization.
(1) Summary Paragraph
I analyzed 550,000 unique employee shifts and calculated the length of time for each meal period. I found that 14.9 percent of shifts were 29 minutes, 13.6 percent of shifts were 28 minutes, 2.7 percent of shifts were 27 minutes, 1.8 percent of shifts were 26 minutes, and 0.9 percent of shifts were less than 26 minutes. Overall, I found that one-third of the meal periods were less than 30 minutes.
Although we can determine what is happening in this analysis, it is difficult to easily interpret the information or make comparisons between categories in this form. Furthermore, this analysis does not show us or tell us what is happening with the other 60 percent of the meal periods.
(2) Summary Table
Summary Table Example
The categories are much easier to compare in a short table, but the table still requires some explanation and is a challenging method for determining any trends. It is also cumbersome to compare the magnitude of the difference among the categories.
(3) Data Visualization
Data Visualization Example
The graph immediately shows the difference between the categories in an easy-to-see manner. Furthermore, the graph provides all of the same information as the paragraph and the table but allows for a very simple and straightforward comparison among the categories. It also uses color to highlight the pattern of the records. The title tells the story of the data.
Qualities of a Good Data Visualization
There are four key qualities of a good data visualization:
- Observable: The graphic contains a fact or a trend that a layperson can see. Visuals should speak for themselves and tell a story that anyone can follow. If you need to spend extra time explaining the graph, it should be reconsidered.
- Objective: The graphic does not attempt to hide a fact or a trend, nor does it attempt to create one. Misrepresentations may be made through misleading or erroneous titles or through the scale of the graph.
- Original: The graphic contains cited, verifiable data sources and should stand on its own.
- Open: The graphic is clear and concise. Complicated graphs will be confusing, difficult to explain, and difficult to interpret. Simple is best!
Graphs: Not All Are Created Equal
There are many types of charts that can be used to visualize data: bar charts, line graphs, and scatterplots, pie charts.
Bar charts have a variety of uses but are typically used when describing and summarizing data from a table. They are primarily based upon univariate (single-variable) data and typically sum up a categorical variable or show a percentage distribution. They are useful for showing comparisons among different categories and single-variable trends over time, and they are best for summing up and showing simple comparisons.
Line graphs are useful for continuous data displayed over time. They can also be used to show how different categories relate to each other over time.
Scatterplots are excellent for showing relationships between two continuous variables. Scatterplots can be used to highlight unseen patterns or relationships that exist between variables.
Some people prefer pie charts. Unfortunately, pie charts are not very useful. They are difficult to interpret and difficult to compare. Your eyes are very good at comparing length (such as a bar chart), but very bad at comparing volume (such as a pie chart). A bar chart is vastly superior for displaying the same information as anything that might be displayed in a pie chart.
Better Bar Charts
Bar Charts Example
Bar charts are most effective if a couple of common mistakes are avoided. Consider the four graphs below, which depict the same underlying data.
Note that the way a graph is constructed can add to or take away from its interpretation. Figure 1 is cluttered with grid lines, and the reader must read text in two directions. Furthermore, the dark grid lines can distract the eye. As Figure 2 shows, you should rotate the text. This allows the visualization of the data to be clear to the viewer. In Figure 3, we bring the bars closer together so that the distribution of data is easily interpreted, and we remove (or lighten) the gridlines. This shows the scale of the impact across the categories. Figure 4 adds color, which is impactful and helps the viewer highlight the trend. It also adds additional data to the graph in the form of a percentage and adds a meaningful title.
Scatterplot Example
Now suppose that we wanted to analyze the relationship between the length of the meal period and the length of the shift that the employee worked. There are certainly summary statistics that could be generated to compare the variables. However, a scatterplot is a perfect way to compare two numeric variables and show if those variables may or may not be related.
Let us take our prior example and compare meal period lengths against the length of the shifts. In the following graph, we added a trend line to show how meal period length and shift length relate to each other. The trend line is positive, suggesting that there may be some relationship between the length of the meal period and the length of an employee’s shift.
Scatterplot Example
Identifying Misrepresentations
It is very easy to create graphs that overstate, understate, or misrepresent fact patterns and trends in the data. Some of the most common methods for doing this involve changing the scale on a graph, graphing only part of the data, and transforming the graph or data using inappropriate methods.
Consider the following when reviewing or designing a graph:
- Is the scale of the y-axis appropriate for the data being displayed?
- Is all of the data being shown, or is the data being truncated in a way that hides a pattern or trend?
- Has the data been transformed or manipulated in some way that hides a pattern or trend?
- Has the graph been properly sourced and cited, and can it stand alone, away from any presentation or writing that is next to it?
Beyond aiding in the correct interpretability by the reader or viewer, the accuracy of visualizations in a litigation context could be the difference between the admissibility or exclusion of the visualizations into evidence. In re Air Crash Disaster at John F. Kennedy Int’l Airport on June 24, 1975, 635 F.2d 67, 73 (2d Cir. 1980). Rule 1006 of the Federal Rules of Evidence provides that summary evidence and visualizations may be properly admitted when the following conditions are met:
- the charts “fairly summarize” voluminous trial evidence;
- they assist the jury in “understanding the testimony or evidence already introduced”; and
- “the witness who prepared the [visualization] is [available for] cross-examination with all documents used to prepare the summary.”
United States v. Green, 428 F.3d 1131, 1134 (8th Cir. 2005) (quoting United States v. King, 616 F.2d 1034, 1041 (8th Cir. 1980)); see also United States v. Boesen, 541 F.3d 838 (8th Cir. 2008).
There has been some debate in other cases about whether all underlying documents must be in evidence or whether the visualization alone may be admitted as evidence. United States v. Janati, 374 F.3d 263, 198 A.L.R. Fed. 811 (4th Cir. 2004); United States v. Jones, 664 F.3d 966 (5th Cir. 2011). Regardless, in all of these cases, the court has ruled that the visualizations must be an accurate summarization of the underlying documents, and the underlying documents must be admissible.
Conclusion
Data visualizations can powerfully impact how someone views and understands data. However, visualizations need to be carefully crafted not only to ensure accuracy in their representation of underlying facts but also to have maximum impact and interpretability for the reader or viewer.
Jeremy Guinta is a senior director at Ankura in Los Angeles, California, and a lecturer at California State University, Los Angeles. Angela Sabbe is a managing director at Ankura in Los Angeles.