The High Cost of 'Free' Price Transparency Data

Price Transparency Files Are Not Truly Free

The Transparency in Coverage Rule made every payer's negotiated rates legally public in July 2022. In theory, any hospital, practice, or researcher can download them. In practice, parsing them yourself runs about $40,000 a year in cloud compute plus an infrastructure engineer and a data engineer — call it $250,000 all-in, and six to twelve months before you see a usable rate. Which means the people the rule was supposed to help mostly can't use it.

Small Practices and Researchers Get Shut Out

Price transparency was sold as a democratizing force: sunlight on negotiated rates would let small practices negotiate like big systems, let employers shop plans intelligently, and let researchers study rate dispersion in public. Four years in, that promise has landed unevenly. The data exists. Access to usable data does not. If you're running a 12-physician practice getting underpaid by a United Healthcare PPO plan, the fact that the evidence is technically public doesn't help you unless you can afford to extract it and see for yourself.

The MRFs Are Hundreds of Terabytes Per Month

The Machine-Readable Files (MRFs) published under the TiC Rule are the largest public dataset in healthcare, by a wide margin. A single month of in-network-rates files from the BUCA payers — Blue Cross Blue Shield, UnitedHealthcare, Cigna, and Aetna — runs into the hundreds of terabytes when uncompressed. UnitedHealthcare alone publishes tens of thousands of individual JSON files every month, some of which exceed 100 GB compressed.

For context: the entire English Wikipedia, uncompressed, is about 100 GB. One payer's monthly MRF drop can be a thousand times that.

The files are published as gzipped JSON, not a queryable database. There is no search interface, no API, no schema enforcement, and no guarantee that this month's file looks like last month's. You download the raw bytes and figure the rest out yourself.

The Cost Lives in Five Places, Not One

The ingestion cost is five small expenses that add up.

1. Egress and storage. Pulling hundreds of terabytes out of payer CDNs every month, holding onto it long enough to process, and keeping historical snapshots for longitudinal analysis. Cloud storage is cheap per gigabyte and expensive per petabyte.

2. Compute. Parsing nested JSON at this scale requires distributed processing — Spark, BigQuery, Athena, or equivalent. A single monthly refresh of the BUCA payers can consume thousands of dollars of compute before you have written a single line of analytical code.

3. Schema normalization. The TiC rule specifies a schema. Payers interpret it differently. Field names drift, nesting changes, nulls appear where numbers used to be, and new "reporting entities" show up without warning. Every month, someone has to reconcile this.

4. Entity resolution. A negotiated rate is useless without knowing whose rate it is. MRFs identify providers by NPI, but the mapping from NPI to practice, system, specialty, and location is its own data problem. Deduping across billing NPIs and rendering NPIs is harder still.

5. Quality control. MRFs contain errors — obviously wrong rates, duplicate entries, impossible modifiers, rates attached to the wrong CPT codes. Without a QC layer, your analysis quietly produces garbage.

Add it up and parsing the MRFs yourself runs roughly $40,000 a year in cloud compute, storage, and egress — and that is the easy part of the bill. The hard part is people. You need at least an infrastructure engineer to own the pipeline and a data engineer to own schema normalization, entity resolution, and QC. Two engineers at market rates, plus their share of tooling and overhead, land the fully loaded cost at roughly $250,000 per year. And that assumes they exist on day one; in reality, hiring them, onboarding them, and shipping a working pipeline is a six-to-twelve-month project before the first usable rate comes out the other side.

The Market Splits Into Haves and Have-Nots

The affordability gap sorts the healthcare market into two tiers:

Organization Type	Can Use Raw MRFs?	Typical Path
Large hospital system	Yes	In-house data team or enterprise vendor
Enterprise payer / consultant	Yes	In-house pipeline (~$250K/yr) or enterprise vendor
Regional health plan	Sometimes	Enterprise vendor, under budget pressure
Mid-sized specialty group	Rarely	Occasional one-off consulting engagement
Small independent practice	No	Nothing
Academic researcher	Rarely	Grant-funded pilot, then abandoned
Journalist / policy analyst	No	Press releases and secondary summaries
Healthcare startup	Depends on funding	Raw MRFs if technical, vendor if not

The groups with the strongest informational case for price transparency — small practices negotiating against national payers, researchers studying market structure, patient advocates — are the groups with the weakest financial case for paying a vendor. The people the rule was meant to help the most can afford it the least.

"Publicly Available" and "Usable" Are Not the Same Thing

When CMS and HHS talk about the TiC rule, the framing is almost always that the data is publicly available. Technically true. Practically misleading. Publicly available data that requires a six-figure pipeline to read is available in the same sense that the bottom of the ocean is publicly available — anyone can go, almost no one can.

This matters for how the rule is evaluated. If you measure success by "are the files posted," compliance is high and the rule works. If you measure success by "can a benefits manager at a 50-person company actually see what their plan is paying," the rule has barely moved the needle. Both measurements are defensible. Only one of them is about transparency.

Enterprise Pricing Happened Because Enterprise Buyers Showed Up First

The first wave of TiC vendors — Turquoise Health and Datavant chief among them — built excellent pipelines and priced them for enterprise buyers. That was a rational business decision. Enterprise buyers have budgets, procurement processes, and a willingness to sign annual contracts measured in tens or hundreds of thousands of dollars. Small buyers have none of those things, and the unit economics of selling $500/month access to a dataset that costs hundreds of thousands a year to maintain are difficult.

The result is a market that looks efficient from inside the enterprise tier and broken from everywhere else. There is excellent data at the top, nothing at the bottom, and very little in between.

What to Do If You're on the Wrong Side of the Gap

If you are in one of the locked-out tiers, a few practical notes:

Start with a single payer and a single specialty. You do not need the full dataset to answer most real questions. A benchmarking exercise for one specialty against one payer is tractable even on a small budget.
Use allowed-amounts files, not in-network-rates files, when possible. They are smaller and often closer to the number you actually care about.
Ask for pre-processed exports. You do not need raw MRFs; you need a CSV with NPIs, CPTs, payers, and rates. Several vendors (Keeper Health among them) will sell you that slice for a fraction of full-dataset pricing.
Do not try to build an in-house pipeline for a one-time question. The sunk cost is almost always wrong for anything less than an ongoing analytics function.

Frequently Asked Questions

Are MRFs really free? The files themselves cost nothing to download. The infrastructure, engineering, and ongoing maintenance required to convert them into usable data is what creates the cost. "Free" refers only to the raw bytes.

Why can't I just open an MRF in Excel? A single large MRF can exceed Excel's row limit by three or four orders of magnitude. They are gzipped JSON files that reference other JSON files, designed for programmatic ingestion, not spreadsheet review.

What does it actually cost to process MRFs in-house? Cloud compute, storage, and egress for a serious cross-payer pipeline runs roughly $40,000 per year. The dominant cost is headcount: an infrastructure engineer and a data engineer, which brings the fully loaded annual cost to around $250,000. Add a six-to-twelve-month build period before the pipeline produces anything useful.

Is this going to get cheaper? Storage and compute costs trend downward, but the hard cost in MRF processing is engineering labor, which does not. What will change the affordability picture is not cheaper infrastructure — it is vendors willing to sell smaller, pre-processed slices at prices a non-enterprise buyer can approve without a board meeting.

Does the government plan to publish a normalized version? Not in any serious form. CMS's role under the TiC rule is to require publication, not to clean the data. A centralized, queryable version would be a large government IT project, and nothing in the current regulatory posture suggests one is coming.

Price transparency has a distribution problem, not a data problem. The negotiated rates are public, the mechanics of access are not, and the market that grew up around the rule is priced for organizations that already had information leverage. In our work processing these files — Keeper Health tracks over 237 billion negotiated rates across the BUCA payers — we hear from many organizations who were told this data was cost hundreds of thousands or was illegal behind NDAs, only to discover it is available to them but hard to get. Closing that gap on accessibility and affordability is the part of price transparency that the rule itself could not deliver, and it is where the next few years of work in this space will actually happen.

mrf price transparency tic rule healthcare data affordability