Have you ever gone to a used book store that has those giant bins of $5 books available to sift through? They’re usually categorized very broadly, using descriptors like “Fiction” or “Cookbooks” or “We’re Not Exactly Sure So Have Fun Digging Around.”

The chances are, if you’re looking specifically for anything written by Sir Arthur Conan Doyle, most of the books in these broadly-marked bins aren’t going to be useful for you. There’s also likely not much quality control when it comes to adding new batches of books to each bin, so it wouldn’t surprise you to find copies of the same book in each bin by mistake. It’s unlikely that Doyle ever wrote a cookbook though, so ideally, you’d be able to entirely eliminate sifting through that bin and instead focus your time between the fiction and miscellaneous bins.

Sorting through payer rate data is a similar experience. Among the crucial rates data we know and love, MRFs sometimes include contracted rates with providers for services those providers would never perform. Imagine being required to look at every option in the cookbook bin before moving to the fiction bin, even though you know Doyle didn’t have much interest in perfecting his take on chicken tetrazzini.

So how do we find and eliminate common scenarios where finding what you need takes longer than it should? Let’s break it down.

About 60% of rates are clinically implausible

You may think the scenario laid out above is more the exception than the norm within payer data files, but based on our number crunching, it’s actually pretty prevalent. About 60% of rates are clinically implausible, and often even impossible (something along the lines of a rate for a psychiatrist performing a knee replacement). We talk about that more in-depth here. You may have heard about these rates before, and they’re usually given cute names, like zombie rates, ghost rates, or finding-sugar-free-desserts-that-you-actually-enjoy-eating rates. What, you hadn’t seen that last one?

How these rates even came into existence in the first place is a good place to start. There are a few underlying reasons:

  1. Stock contract templates: Payers have boilerplate templates that vary from simply a single fee schedule to as complex as an inpatient hospital agreement with numerous rate types. When providers go in-network with a payer, they often sign a contract that includes rates for all billable services, even though they may bill only a subset of them.
  2. Schema Design: In the current CMS mandated schema, payers associate rates with all providers at a facility, which leads to physicians being associated with services they may not be associated with. Turquoise CTO Adam Geitgey wrote about this issue earlier, and he included a visual that shows the magnitude of the repetition caused by the schema design.

3.   Errors in the Data: Hannah Montana said it and we believe it: nobody’s perfect. Due to the sheer magnitude of posting all items and services, it’s inevitable that payers make mistakes in the processes of gathering and preparing their MRFs.

Okay but is that really that bad?

Even though they’re fairly common, are zombie rates really a big deal? In short: yes. At their core, they pose two major problems:

  1. They’re the root cause of massive database bloat. The already enormous rates database veers toward becoming borderline unusable.
  2. They result in nonsensical, unusable data. We’ve seen this reported constantly in the media’s coverage questioning data usefulness, price transparency vendor blogs, and even in Congressional feedback on the state of price transparency. When users encounter data they know isn’t right, two poor results occur:

- Subpar user experience

- Distrust in both a data product designed to showcase payer MRF data, and,  more broadly, in the concept of price transparency and its long-term impact on patients, providers, and insurance payers.

Knowing that most users (and even casual industry observers) have a fundamental distrust of MRF data is extremely motivating for the entire Turquoise team, because we believe accurate, clean, and usable price transparency data is foundational to minimizing the financial complexity of healthcare for all stakeholders. We firmly believe there is good data available. The trick is finding a solution that showcases good data and good data alone.

Excluding these zombie rates is our first step to providing a delightful experience to our data product users and a step toward creating more palatable products. As we started working toward eliminating ghost rates, it was paramount that we kept the flip side of the task at hand in mind. If our end result excludes too many rates, we run right back into a different flavor of the same problem: a user expects to see data but doesn’t because a faulty process excluded valid data. It’s important that we strike the right balance of meaningfully excluding data.

Setting the Scene

We are taking a claims-first approach to cleansing zombie rates from our payer database. That process begins with a limited set of manually-defined rules. Our methodology includes making decisions about services and providers in aggregate. We approach aggregate actions in a few different processes:

  • Taxonomy Rollup: Rolling individual physicians or facilities up to statistically relevant cohorts. Our current approach rolls up providers based on their unique combination of Taxonomy Classifications, Groupings & Specializations, if applicable.
  • CPT Rollup: Rolling all CPTs up to their sections / anatomy sections according to the most granular level of CPT categorization. For example, Incision Procedures on the Foot and Toes as defined by the AAPC.
  • Claims Analysis: Utilizing a claims database to cross-check and ensure we correctly associate taxonomy rollups with the CPT service categories that they perform. For example, we use claims data to confirm our process categorized podiatrists as the type of clinician who may perform incision procedures on the foot and toes.

In addition, we defined a set of Service Categories that may be billed by any type of provider, such as general services like Hydration or E&M CPTs. Taking that one step further, we flagged all hospitals as likely to bill all services to prevent any rates from getting excluded.

Claims Based Associations

For all other services, we utilize a claims database to identify the top 99% of services (by volume) that each provider performs. As a result, we cut out the bottom 1% of services performed by a grouping of providers, including services that we never see billed by providers (0% volume). To get there, we use claims data and work through a series of transformation steps:

  1. Associate Outpatient claims to a primary HCP (physician) and HCO (organization / facility / location) NPI for services billed
  2. For the NPIs above, we utilize Taxonomy rollup logic to roll providers up to broader groups with more claims
  3. Roll up individually-billed CPTs or HCPCS according to logic defined above
  4. Find the total ratio of services billed, by each provider group, for each service category to the total number of services billed by that provider group
  5. Flag zombie rates based on ratios calculated above, where any service not in the top 99% of services billed is flagged within the specific taxonomy bucket

Finally, like any good engineering team, we call in some quality assurance. No one approach will yield perfect results, so we check for

  1. Cases where the process incorrectly excluded rates
  2. Cases where zombie rates still appeared in results

In our review, most of those incorrect scenarios are the result of improper and vague taxonomy association, such as an endoscopy clinic grouped to generic Multi Specialty or Surgery Taxonomy. All in all, this process ensures that all zombie rates are exterminated.

Looking Ahead

As we iterate and improve our process, we’re guessing more manual rules are on the horizon. In an effort to provide coverage where providers’ taxonomy roll ups are too vague, we plan to build rules. For example, if a clinic has ‘eye’ in the name, we can likely assume they aren’t providing any generalized lower body surgeries and will likely want to exclude those rates.

We have yet to tackle DRGs, as they contribute a significantly smaller amount of data bloat, but we’re keeping an eye out should that change.

How does this affect each of our data products?

We’ve finally reached our Oprah moment! All users of our data product benefit from the lack of zombie rates. You get a cleaner data set! You get shorter query times! You get less noise in your results! Everybody gets rate exclusioooooooooooooooon!

That matters significantly both for users and patients who are waiting to reap the benefits of price transparency data. The future of automated, timely, and accurate estimates and revenue cycle workflow products looks a lot brighter when the underlying data is small and useful. Plus, we’ve seen an uptick in compliance and enforcement in the world of hospital MRFs, and we anticipate payer MRF compliance is soon to follow. That’s all the more reason to highlight a path toward MRF simplification and cleanliness.


Interested in taking a tour through data that’s gone through the processes we just laid out? Dig into Rate Sense to see for yourself!