API Docs/Parquet Output Schema

Parquet Output Schema

Results are delivered as one or more Snappy-compressed Parquet files (.parquet) downloaded from the pre-signed URLs in the download_urls array returned by Get Search Status once a job reaches status: "completed". Each row represents a single negotiated rate for a provider–code–plan combination.

Every shard has the same schema; together they are the complete result set. BigQuery shards large result sets at export time (the per-file cap is 1 GB), so small results typically produce a single file while larger ones produce many. Clients should always iterate download_urls rather than assume a length — DuckDB, pandas (pd.read_parquet([...])), and pyarrow all accept a list of Parquet files as a single dataset.

Rows are deduplicated by EIN — each (provider, plan, billing_code) tuple appears once. If the same rate was reported for multiple network names — or if multiple NPIs map to the same EIN — they will appear as separate distinct rows in the final table.

ColumnTypeDescription
npiint10-digit National Provider Identifier.
einintEmployer Identification Number of the billing entity.
network_namestringPayer network name. May be empty if not specified in the source file.
business_namestringProvider's registered legal business or practice name.
plan_namestringPayer plan identifier (slug format, e.g. national-ppo).
billing_codestringCPT / HCPCS billing code.
billing_classstringService classification — professional or institutional.
modifierstringCPT modifier codes, if applicable. May be empty.
ratefloatNegotiated rate amount in USD.
negotiated_typestringRate type — e.g. fee schedule, negotiated, derived, percentage.
fee_schedulestringFee schedule name used for this rate.
service_groupstringGrouping category for the service.
num_rate_groupsintNumber of distinct rate values observed for the tuple (ein, billing_code, plan_name, modifier, service_group, billing_class). Useful for flagging codes where the provider has multiple contracted rates under the same plan.

Loading tips

  • Files are Snappy-compressed Parquet — most modern readers (pandas, DuckDB, pyarrow, Polars, Spark) handle Snappy transparently with no extra flags.
  • Pass the full list of shard paths to your reader to load the result as a single dataset: pd.read_parquet(paths), duckdb.read_parquet(paths), pyarrow.dataset.dataset(paths).
  • Every row has an ein, so you can join results across providers sharing a TIN without additional lookups.
  • Empty strings are used for missing string fields, not NULL.
  • Code Examples show how each language iterates download_urls and writes the shards end-to-end.

See also

  • Get Search Status — returns the download_urls these files are fetched from
  • Code Examples — streaming download snippets in four languages
  • Create Search — the request fields (npis, billing_codes, fee_schedules) that shape what ends up in the Parquet
Keeper Health API v1 · Questions? company@keeperhealth.com