API Docs/Reimbursement Data Schema

Reimbursement Data Schema

Each row represents a single negotiated rate for a provider–code–plan combination. Rows are deduplicated by EIN — each (provider, plan, billing_code) tuple appears once. If the same rate was reported for multiple network names — or if multiple NPIs map to the same EIN — they will appear as separate distinct rows in the final table.

ColumnTypeDescription
npiint10-digit National Provider Identifier.
provider_namestringProvider's individual name as registered with the NPI.
einintEmployer Identification Number of the billing entity.
business_namestringProvider's registered legal business or practice name.
network_namestringPayer network name. May be empty if not specified in the source file.
payer_namestringPayer name matching the payer you passed in fee_schedules[] on Create Search.
plan_namestringPayer plan identifier (slug format, e.g. national-ppo).
billing_codestringCPT / HCPCS billing code.
code_categorystringCategory for the billing code (e.g. evaluation_and_management, imaging).
billing_classstringService classification — professional or institutional.
modifierstringCPT modifier codes, if applicable. May be empty.
ratefloatNegotiated rate amount in USD.
medicare_ratefloatMedicare reference rate for the same code, when available.
pct_of_medicarefloatrate / medicare_rate as a percentage.
negotiated_typestringRate type — e.g. fee schedule, negotiated, derived, percentage.
fee_schedulestringFee schedule name used for this rate.
service_groupstringGrouping category for the service.

Delivery format

Results are delivered as one or more Snappy-compressed Parquet files (.parquet) downloaded from the pre-signed URLs in the download_urls array returned by Get Search Status once a job reaches status: "completed".

Every shard has the same schema; together they are the complete result set. BigQuery shards large result sets at export time (the per-file cap is 1 GB), so small results typically produce a single file while larger ones produce many. Clients should always iterate download_urls rather than assume a length — DuckDB, pandas (pd.read_parquet([...])), and pyarrow all accept a list of Parquet files as a single dataset.

Loading tips

  • Files are Snappy-compressed Parquet — most modern readers (pandas, DuckDB, pyarrow, Polars, Spark) handle Snappy transparently with no extra flags.
  • Pass the full list of shard paths to your reader to load the result as a single dataset: pd.read_parquet(paths), duckdb.read_parquet(paths), pyarrow.dataset.dataset(paths).
  • Every row has an ein, so you can join results across providers sharing a TIN without additional lookups.
  • Empty strings are used for missing string fields, not NULL.
  • Code Examples show how each language iterates download_urls and writes the shards end-to-end.

See also

  • Get Search Status — returns the download_urls these files are fetched from
  • Code Examples — streaming download snippets in four languages
  • Create Search — the request fields (npis, billing_codes, fee_schedules) that shape what ends up in the Parquet
Keeper Health API v1 · Questions? company@keeperhealth.com