chevron-down Created with Sketch Beta.
December 07, 2019

Big Data as a Big Source of Healthcare Fraud Prosecutions – and Defenses

By Jason Mehta, Esq., Bradley Arant Boult Cummings, LLP, Tampa, FL

As most healthcare providers know, hardly a day goes by without the Department of Justice (DOJ) announcing a new flashy healthcare fraud settlement or conviction.  And many of these splashy press releases explicitly cite the use of data and data mining as a way of identifying and prosecuting the alleged wrongdoer.1

In fact, even when it comes to the healthcare crisis du jour — the opioid crisis — the DOJ proudly boasts of its data-mining efforts.  As former Attorney General Jeff Sessions barnstormed across the country promoting the DOJ’s opioid initiative last year, he developed a standard stump speech.  In that often-repeated speech, he proudly promoted a new DOJ initiative known as the Opioid Fraud and Abuse Detection Unit, a new data analytics program that focused on opioid-related healthcare fraud. In his words, this unit was using data and data-mining techniques in order to “tell us important information – who is prescribing the most drugs, who is dispensing the most drugs, and whose patients are dying of overdoses.”2  

Whether it’s opioids or any other healthcare investigation, the government is increasingly using data to identify cases, develop strategies for investigation and prosecution, and tell the story to jurors.3 Increasingly, attorneys — both within and outside of government — are understanding the importance of data and are developing strategies to use healthcare data as both a sword and a shield.

This article analyzes the background of data mining in healthcare fraud prosecutions, highlights several practical high-profile fraud cases built on data, and offers practical advice for healthcare providers on using data as both a proactive compliance tool and a powerful defense after an investigation has commenced. 

Background Regarding Data Mining in Government

The concept of data mining is not new. For years, private industry has been using sophisticated algorithms and software to mine large sources of data to identify patterns and highlight trends.  From education to construction to non-profits, data mining has been utilized to make sense of vast arrays of information.4  Now, the government is catching up — particularly in the healthcare fraud enforcement space.

With respect to healthcare, the United States government has a vast array of data available at its fingertips. Whenever a healthcare provider submits a claim to the Medicare program, for example, the government receives dozens of pieces of information, including the patient’s name, date of birth, place of service, date of service, current procedural terminology (CPT) code and diagnosis code. These are just a few examples of data sets collected by healthcare regulators; vast systems for Electronic Health Record (EHR) data have the capacity to collect much more. The government is increasingly surveying this data to learn about trends and locate potential outliers, as discussed in further detail below.5

This assessment about the burgeoning role of data is reflected in the government’s own characterization of data mining.  The Department of Health and Human Services (HHS) Office of Inspector General (OIG), the federal agency most directly tasked with overseeing the Medicare program, prides itself on its data analytics team.  In its own words, “OIG uses Data Driven Decision Making to produce outcome focused results.”6  Further, OIG notes that it “leverages sophisticated data analysis to identify and target potential fraud schemes and areas of program waste and abuse.”7 In fact, OIG created a Chief Data Officer position in 2015 to facilitate the creation of internal tools that increase OIG’s access to and utilization of data.8

In the words of OIG’s Chief Data Officer, there is a tremendous increase and usefulness of data mining in developing possible fraud cases:

So what it really means is having high quality lead-generation for either our investigators, our auditors, our evaluators or for compliance oversight. One of two things can happen with our advanced analytics. Either the data can lead us to somebody that is potentially committing fraudulent activity or our investigators can have a hotline call where they can have a witness or a whistleblower come tell them that they suspect criminal activities happening, and we can bounce that against the data.  So it's a really a combination of the data analytics and the data scientists and our statisticians and computer programmers with that field intelligence of our law enforcement agents working in the field -- that combination is very powerful.9

Given this “very powerful” tool, healthcare providers and their counsel would be well-served by understanding the nuances of data analysis — and also understanding data’s limitations.  Fortunately, HHS has elaborated on its use of data mining and has shared some of the key concepts that it focuses on when mining data. 

Practical Examples and Metrics of Data Mining

Over the years, both HHS/OIG and the DOJ have articulated some of their more heavily utilized data-mining metrics.  Understanding these metrics is a useful way of identifying the trends and statistics that matter to the government.  Some of these metrics include:10

  • Trend Tool. Broadly defined, this tool looks at trends over time to see how providers’ prescribing and billing habits fluctuate and whether certain spikes emerge. This temporal analysis is often very critical in understanding how and when patterns change.  The government frequently uses this trend analysis to see whether certain events (such as a new executive coming on board, a new financial arrangement, or other externality) affected healthcare claims. 
  • Peer Comparison Generator. The government very often compares providers to their relative peers to assess how their claims compare to others.  This analysis is, of course, not dispositive, but it is highly relevant.  If one doctor stands out to her peers by several orders of magnitude, this physician might find herself on a government investigation list. For example, government agents may search for physicians who prescribe higher amounts of opioids than their peers.
  • Link Analysis.  This tool examines relationships from one entity to another.  For instance, the analysis might examine one physician’s relationship with a pharmacy and whether that relationship might be tainted by improper kickbacks. In this example, the government would look at both the physician and the pharmacist to determine any linkages in patients, billing patterns, and other indicators of a possibly nefarious relationship under the Anti-Kickback Statute, which prohibits the exchange of anything valuable for the referral of services payable by the government.
  • Payments by Geographic Area.  The government assesses different regions of the country individually to determine whether there are any specific geographical spikes related to billing.  Often fraud schemes are prolific in one particular geographic area.  For example, certain durable medical equipment (DME) fraud might be prevalent in one region of the country but not another.11  By being able to focus on geographic-specific trends, the government is able to determine where best to apply resources.  
  • Dashboards.  Finally, a catch-all tool used within government is to look at a variety of metrics on one dashboard.   The government is able to compare Medicare claims, billing data, prescriber employment records, and the like to determine a holistic “dashboard” view of a provider’s practice.

These metrics are just illustrative examples of how the government mines data to develop possible cases. Depending on the subject matter, the government might use more sophisticated or nuanced data.  For example, in the opioid-related investigations mentioned above, the government has developed sophisticated metrics involving specific doses of drugs, and the government has compared how quickly — and how often — certain providers prescribe opioids.12   

As data mining becomes more prolific, one can expect that government agents and prosecutors will become increasingly adept at data mining manipulation.  Accordingly, some of the tools that seem robust today will likely be primitive in the near future. Nonetheless, understanding these tools — and the government’s use of these tools — is an important way for healthcare providers and their attorneys to best stay off the government’s radar screen. 

Practical Examples of Data Mining

It is useful to consider several practical examples of the government using data mining to develop specific prosecutions. These examples are meant to be illustrative of the government’s increasing sophistication with respect to data analysis.

One of the most far-reaching examples of the use of data mining was in 2015, when the government targeted an entire industry: compound pharmacies. Based on government reporting, the industry was selected for review due to an atypical and aberrant spike in billing to the TRICARE program.13 The government used a panoply of data tools to identify pharmacies that stood out relative to their peers. These tools included looking at trend analyses, top billing pharmacies, and pharmacies with a relatively few number of prescribers.

Another example of data mining was the recent high-profile prosecution of Salomon Melgen, M.D., a well-known ophthalmologist in South Florida and, at one point, one of the top three highest Medicare billers in the country. The case against Dr. Melgen was put to trial in 2018. The government’s case was predominantly based on data analysis and data mining.  Evidence adduced at trial showed Dr. Melgen to be a significant outlier relative to his peers, including performing dozens of tests that his peers never conducted. This type of peer analysis was critical in obtaining a government verdict and the subsequent imposition of a 17-year prison sentence.14

Practical Compliance Tips for Providers

In light of the government’s focus on data analysis, healthcare providers and counsel would be well-served by adapting their practices to reflect this new source of law enforcement referrals.  Below are 10 practical tips to get ahead of the curve.

First, as a threshold matter, understand the emergence of data-driven analysis.  Healthcare counsel need to recognize that regulators are increasingly harnessing and using the power of data to identify outliers.  By understanding this focus, providers can begin the process of undertaking proactive steps to ensure maximum compliance and reduce risk.

Second, to the extent it is not done already, start to collect and store relevant data.  While most healthcare providers are already collecting some data, it is a best practice to ensure that clients have a system in place to capture as much relevant data as possible.  Information is power.  And, in collecting data, it is important to be thoughtful about how data is collected and what data is actually being tracked.  For example, in a recent study at an ophthalmology clinic at the University of Michigan, EHR data matched patient-reported data in just 23.5 percent of records.15 In that study, when patients reported having three or more eye health symptoms, the EHR record was inaccurate, as the database did not capture tertiary diagnoses and symptoms. Therefore, it is advisable to ensure that the EHR system being used is up to date, the work-flow process is practical and efficient, and users are accurately inputting data into the systems.

Third, educate others within the provider about the importance of data collection and data analysis. One of the most critical pieces to harnessing and leveraging the power of data is to educate employees about the importance of accurate data collection. This means teaching physicians, for example, to accurately collect data from patient encounters.  It means teaching billers and coders about including all relevant fields, even if those fields might not ultimately be billed.  Most practices start — with good reason — at proper collection of claims information.  But a best practice is to collect more than just claims information but also relevant fields on patients’ clinical records (e.g., medications, imaging studies, lab reports), as well as other external data (e.g., prescriptions and financial information.)

Fourth, understand the importance of data cleanliness.  Just like most clinicians understand the importance of cleanliness in the operating room, so too must healthcare providers understand the importance of cleanliness in data.  Remember the adage of “garbage in, garbage out.”  Unless the healthcare data is accurate when it is entered, it cannot be relied upon afterwards.  Therefore, providers must constantly clean or scrub data to ensure that it is accurate, correct, consistent, relevant, and not corrupted.  One idea is to consider the use of a data steward or an outside vendor if this cannot be accomplished internally.

Fifth, compliance counsel using data to build a case for their clients must always remember that the data is only as good as the query.  In order to get a meaningful understanding of data to build a successful defense, compliance counsel needs to have access to the right data and query this data correctly. Looking at a million fields of data doesn’t mean much; it means only something in context.  Thus, a best practice is to start at the end: ask what information is ultimately wanted.  If counsel wants to know, for example, which providers are billing the most procedures, they would need to focus on billing data.  If counsel is interested in possible suspect kickback arrangements, they would need to review billing data in concert with financial data.

Sixth, compliance counsel would be well-served by looking for trends in their client’s own data — preferably before the government does so.  Some compliance counsel look at, for example, the top CPT codes being billed by clients.  Looking at the utilization of these codes and how this utilization has changed over the past few years can provide valuable leads and can likely lead to fruitful conversations with clients.  For example, compliance counsel should identify top outlier physicians and ask clients, such as hospitals or employers, why certain outlier physicians are so far ahead of their peers.  Likewise, counsel should study top referrers and help their clients make sure they can explain why certain referrers stand out.  Ultimately, counsel must be able to explain changes or variances because when they are significant, the government is likely to ask questions.

Seventh, many providers are beginning to gather not only their client’s data, but also their peers’ data and doing their own comparisons.  As mentioned above, the government is increasingly comparing providers to their peers.  While providers typically do not have access to others’ data to do this type of analysis, there are ways to use open-source data to approximate this type of comparison.16  Thus, where appropriate, it is useful to examine open sources for data and to aggregate this data to develop a complete picture of where providers and individuals stand relative to others.

Eighth, it is a best practice to combine data sets to get a full picture. Looking at data in a vacuum is unhelpful.  Looking at, for example, the total reimbursement of a client and ignoring other data, such as the client’s patient population mix or fair market payments to/from other providers, does not reveal much.  Those providers that are most effectively harnessing data as a sword and shield are looking at the complete picture of data available.  This means, as a practical matter, synthesizing and visualizing data across different data sets.  A best practice is to catalogue all available data sets and then determining how these databases can overlap with one another.

Ninth, it is vitally important to update a provider’s data periodically.  Data in healthcare, like all of healthcare, is not static — it constantly changes.  For instance, a patient may update his/her address or may update his/her prescription medications.  Therefore, understand what data requires updating and schedule ticklers to ensure that data is being updated.

Tenth, always remember the Health Insurance Portability and Accountability Act (HIPAA).17  HIPAA’s protections and mandates apply to aggregated data, just like it applies to individual files. Therefore, follow the HIPAA security requirements, such as authentication protocols and control over access to protect the data.18 One best practice is to consider housing a de-identified data set.  The benefit of this is that it removes the patient identifiers and, therefore, might be exempt from HIPAA’s mandates. This might allow for easier access in manipulating and analyzing the data. 

Using Data When the Government Asks Questions

Inevitably, no matter how much proactive compliance is done, many healthcare providers will face the scrutiny of government regulators.  Most do not need a reminder that healthcare is an incredibly regulated industry and that regulators will likely ask questions.

Therefore, healthcare providers would be well-served by not panicking when the inevitable government subpoena (or request for information, such as a Civil Investigative Demand) arrives but taking the scrutiny seriously. When the government asks questions, a few basic pointers are critical: (a) preserve all documents and all data; (b) ensure proper internal reporting; and (c) try to determine the focus of the government investigation from the subpoena or otherwise.  These concepts are, admittedly, self-evident for most healthcare providers, but they are nonetheless critical threshold steps.

While these reminders are somewhat axiomatic, thinking critically about data on the front end in responding to government inquiries is less obvious.  Data can be a powerful response to a government investigation. Remember that data is not just helpful with respect to proactive compliance; it is also helpful in rebutting a governmental inquiry.

Increasingly, providers recognize that, once the government’s focus becomes clear, looking at the target’s data is critical.  Using this data — and presenting it in a favorable light — can be very helpful to negate the government’s assumptions.  For example, data can be used to negate scienter by showing that an anomaly was just that — a one-off, rather than a broader practice.  And, for those cases that cannot be resolved before litigation is initiated, data is a very helpful tool to illustrate a defendant’s perspective at trial. 


The emergence of data analytics is changing business as usual across all industries, and the government healthcare enforcement space is no exception.  By understanding the government’s focus on data analytics, and by implementing practical suggestions to use data as both a proactive compliance tool and a reactive defense, healthcare clients can better prevent themselves from government scrutiny.  An ounce of prevention is worth a pound of cure.  Investing in good data analysis techniques today can prevent or mitigate a series of inquiries later.  

  1. See, e.g., United States Settles False Claims Act Allegations Against Jacksonville-Based Fertility, available at (“This case was developed by proactively mining healthcare reimbursement data. In mining through this data, the Center was identified as a top biller of fertility related treatments.  In addition, through this data mining, government investigators were able to determine that the Center had billed for services allegedly rendered by Dr. Fox – the owner of the practice – even when he was out of the country.”); see also Four Area Hospitals Pay Millions to Resolve Ambulance Swapping Allegations, available at (“Among the tools instrumental to the settlement were those provided by HHS-OIG’s Chief Data Office, Consolidated Data Analysis Center (CDAC). CDAC provides HHS-OIG and its law enforcement partners with best practices, consultancy and skills development in data mining, predictive analytics and data management and modeling in support of fraud prevention and recovery.”)
  2. See Attorney General Sessions Delivers Remarks Regarding Trump Administration’s Response to Opioid Epidemic, available at
  3. See, e.g., Collecting and Using Data for Prosecutorial Decisionmaking, available at
  4. See, e.g., Educational Data Mining, available at; How Big Data and Analytics Are Transforming the Construction Industry, available at; How Nonprofits Are Using Data to Make a Difference, available at
  5.  While this article details the federal government’s data mining in healthcare, data mining is not unique to federal regulators.  In fact, increasingly state regulators and private insurers are doing their own data mining to identify suspicious claims and trends.  And increasingly the federal government is using these data analytics from third parties as investigative leads. See Claims Data and Health Care Fraud: The Controversy Continues, available at (noting increased use of data mining and partnerships between the federal government and private payors). 
  6. HHS OIG Justification of Estimates for Appropriations Committees for Fiscal Year 2019, available at
  7. Id.
  8. See Data-Driven Government: The Role of Chief Data Officers, available at
  9.  Transcript for audio podcast: “What Role Does Data Play in Fighting Healthcare Fraud, Waste and Abuse?” available at
  10. One of the best presentations demonstrating HHS’ data analytics tool is available at This presentation, entitled “Using Analytics to Reduce Healthcare Fraud, Waste, and Abuse,” provides in-depth information about HHS’ current efforts.
  11. By way of an illustrative example, South Florida has long had a unique issue with various forms of alleged durable medical equipment fraud and the DOJ has, accordingly, devoted resources specifically to this problem.  The resources devoted to the Southern District of Florida are different from those devoted to other geographic regions where a HEAT strike force team is deployed. 
  12. See Bloomberg Law, Opioid Fraud Crackdowns Get Help From Data Mining, available at
  13. See United States Settles False Claims Act Allegations Against Compound Pharmacy Owner For $4.25 Million, available at (noting “This case was developed through an initiative to track and prosecute compound pharmacies that submitted millions of dollars in improper claims to the TRICARE program.  The government estimates that up to $2 billion of tainted and unnecessary compound prescriptions had been submitted to and paid by the government. In the Middle District of Florida, the government has recovered almost $70 million in fines and penalties over the past 18 months.”)
  14. See The New York Times, Doctor Linked to Senator Menendez’s Corruption Case is Convicted of Fraud, available at
  15. See Completeness of Electronic Dental Records in a Student Clinic: Retrospective Analysis, Seth Aaron Levitin, BSc; John T Grbic, DMD; Joseph Finkelstein, MD, PhD, JMIR Med Inform 2019;7(1):e13008) doi:10.2196/13008, available at
  16. For example, providers can access open-source data from the following sources: (1); (2); and (3)
  17. The Health Insurance Portability and Accountability Act of 1996 (HIPAA), P.L. No. 104-191, 110 Stat. 1938 (1996).
  18. 45 C.F.R. Parts 160, 164.

About the Author

Jason Mehta is a partner at Bradley Arant Boult Cummings, LLP in Tampa, Florida.  He formerly was a federal prosecutor focusing on healthcare fraud.  During his five years at the Department of Justice, he recovered more than a quarter of a billion dollars and prosecuted dozens of white-collar executives.  He now advises individuals and corporations in both civil and criminal DOJ inquiries and investigations.  He may be reached at [email protected].