Parquet Output Schema
Results are delivered as one or more Snappy-compressed Parquet files (.parquet) downloaded from the pre-signed URLs in the download_urls array returned by Get Search Status once a job reaches status: "completed". Each row represents a single negotiated rate for a provider–code–plan combination.
Every shard has the same schema; together they are the complete result set. BigQuery shards large result sets at export time (the per-file cap is 1 GB), so small results typically produce a single file while larger ones produce many. Clients should always iterate download_urls rather than assume a length — DuckDB, pandas (pd.read_parquet([...])), and pyarrow all accept a list of Parquet files as a single dataset.
Rows are deduplicated by EIN — each (provider, plan, billing_code) tuple appears once. If the same rate was reported for multiple network names — or if multiple NPIs map to the same EIN — they will appear as separate distinct rows in the final table.
| Column | Type | Description |
|---|---|---|
npi | int | 10-digit National Provider Identifier. |
ein | int | Employer Identification Number of the billing entity. |
network_name | string | Payer network name. May be empty if not specified in the source file. |
business_name | string | Provider's registered legal business or practice name. |
plan_name | string | Payer plan identifier (slug format, e.g. national-ppo). |
billing_code | string | CPT / HCPCS billing code. |
billing_class | string | Service classification — professional or institutional. |
modifier | string | CPT modifier codes, if applicable. May be empty. |
rate | float | Negotiated rate amount in USD. |
negotiated_type | string | Rate type — e.g. fee schedule, negotiated, derived, percentage. |
fee_schedule | string | Fee schedule name used for this rate. |
service_group | string | Grouping category for the service. |
num_rate_groups | int | Number of distinct rate values observed for the tuple (ein, billing_code, plan_name, modifier, service_group, billing_class). Useful for flagging codes where the provider has multiple contracted rates under the same plan. |
Loading tips
- Files are Snappy-compressed Parquet — most modern readers (pandas, DuckDB, pyarrow, Polars, Spark) handle Snappy transparently with no extra flags.
- Pass the full list of shard paths to your reader to load the result as a single dataset:
pd.read_parquet(paths),duckdb.read_parquet(paths),pyarrow.dataset.dataset(paths). - Every row has an
ein, so you can join results across providers sharing a TIN without additional lookups. - Empty strings are used for missing string fields, not
NULL. - Code Examples show how each language iterates
download_urlsand writes the shards end-to-end.
See also
- Get Search Status — returns the
download_urlsthese files are fetched from - Code Examples — streaming download snippets in four languages
- Create Search — the request fields (
npis,billing_codes,fee_schedules) that shape what ends up in the Parquet