One of the most significant ways we’ve enriched our price transparency data to make it more usable and all-around friendly is network mapping. A lot of hours have gone into ensuring that our network mapping is pristine. Specifically, a task near and dear to my to-do list: string cleaning.
Wait, what’s network mapping?
Network mapping is as it sounds: we take the raw data from each hospital machine-readable file (MRF) and “map” payer name and class models into one easy-to-understand table.
Each hospital MRF we ingest can have hundreds of individual plans associated with different rates. These raw plan names come to us in a variety of forms that vary in clarity and the level of detail they provide (Read: it’s a lot of mumbo-jumbo).
We’ve implemented systems to make sense of these plan names, so we can categorize them correctly and efficiently under payers, payer classes, and products. So when you’re searching through price transparency data, you can easily look at the differences between plan rates per payer, payer class (think the differences in contracted rates for one CPT by Medicare, Medicaid, commercial, cash, etc.), and product (HMO vs PPO). It’s cool, trust me.
Mapping Payer Class and Payer
Using a FastText machine learning model, we label our plans to a particular payer class as new data is ingested. After manually labeling a series of plans to their corresponding payer class, we feed FastText the raw plan name and label so the model can learn, and then build, word embeddings of the plan names. This trains the model to map each plan to a payer class.
In a moment of job security for humans, though, this is not a perfect method. After the model has mapped each plan to a payer class, we finish the process by using a rules-based system to manually re-label the more commonly mislabeled payers. This way, we can go forth without a hammer of doubt that our network mapping is as accurate as possible.
To label payers, we use the same logic created to map payer class. This allows us to classify commercial payers under the entity that provides a particular plan (think folks like Aetna, United Healthcare, Blue Cross Blue Shield of Arkansas, and EmblemHealth).
And Mapping Product Labeling
We label plans under a particular product type whenever applicable. It’s sick. As opposed to the machine-learning used for labeling payers and payer classes, we label our products using a rules-based system (e.g. if the plan name contains “PPO”, it would get the “PPO” label).
First, we label all plans that are easily identified in one of the following product types: HMO, PPO, EPO, and POS. Then we find the payer-specific plans that cleanly fit in one of those product types and should be labeled as such.
For example, we might encounter a plan name “Aetna Managed Choice.” Aetna categorizes this plan as a PPO product, thus we can safely label it as “PPO.” Lastly, we classify other payer-specific products as their own products, assuming they have enough volume to justify a label at a larger and aggregated level. This could be products such as Cigna’s “LocalPlus” and UHC’s “Nexus ACO.”
Finally, to string cleaning
To improve the likelihood that products will be labeled correctly by the model, I spend diligent time string cleaning while headbanging at my desk like the respectable metalhead that I am. String cleaning is the art of manually hunting through raw data, looking for patterns of text that we know the model would struggle to discern. It’s really fun.
Since our model is a FastText model, it works off of “tokens.” Not to be confused with potential offerings to the Egyptian god Sebek, tokens are designated, separate bits of text, like words in a sentence. The model works off these tokens to “read” the “sentence” and predict what payer and class it should be.
This process is complicated because there is no standard schema for hospital MRFs' creation. This means that every MRF is created differently, following no singular logic, rhyme, or reason for how things are arranged. To be able to account for every token the model might need to read each MRF is a continuous, manual process. So I go in every so often and clean, aka, hunt for nuances within each MRF and feed them to the model so it can better predict payer class and model.
For example, a word important for the model to recognize is “Medicare.” But if it’s reading a file with no space between the payer and class name, like “humanamedicare” it doesn’t know how to pull out the appropriate information. A similar example would be if the model was trying to read the file below. Since there is no space between “ANTHEM” and “MCR,” the model doesn’t recognize “ANTHEMMCR” as a token, meaning the model doesn’t technically see it, thus resulting in prediction errors.
To fix this, I note the instances where I need to retrain the model. Then I take the training data and a preprocessing file (basically a Python script that outlines the changes the model needs to learn) and together, they retrain the model.
The string thickens
Another more complicated example of this would be if “MCR” was placed within the middle of a string and didn’t actually mean “Medicare.” It could be part of another word but the model won’t know that, so it will mislabel the class as Medicare.
Or, “MCR” could be in the middle of the string with “Medicare” repeated at the end, like it is above. To fix this, I embark on tedious searching, looking for every instance of “MCR” and teach the model the differences between an “MCR” that means Medicare and an “MCR” that doesn’t. By creating these boundaries, we can prevent the model from incorrectly reading a string.
Does cleaning lead to better accuracy? Yes. Yes, it does.
We measure precision and recall to ensure that our model is working correctly, and I’m pleased to say that for both precision and recall, the models score >0.9.
We measure precision and recall by:
- Creating training data by labeling accounts by hand. In the case of the payer name model, that means labeling a few thousand plan names with the actual payer name by hand.
- Then, splitting the labeled data randomly into 90% training data and 10% test data. The 90% chunk trains the model, and the 10% is held out so the model never sees it.
- After the model is trained, we test it by running the 10% through it and seeing how often the machine prediction of the unseen 10% of data matches the human label. With this, we’ll get a number in the very high 90’s. Nbd.
We do this for you
The tedium laid out above makes the data easier to use. Even with the amount of data cleaning we already do, we are always finding new instances to feed to the model. Due to a lack of standardization, we are always thinking up new ways to tackle the nuances within the data. That we need these models alone should tell you how messy the raw data is, to begin with. But! Never fear! CMS has published a proposed final rule for the hospital schema! And once they make it final, I can finally watch the entire Ghibli movie catalog during work hours.
Any ideas of how we could be doing this better? Tell us here!