API Docs/Agentic Prompt

Agentic Prompt

If you're building with an AI coding agent (Claude Code, Cursor, Windsurf, a custom agent, etc.) and want it to integrate the Keeper API, paste the block below into your agent's system prompt or initial instructions. It is self-contained — it covers the full async workflow, authentication, idempotency, polling, error handling, and the shape of the CSV output so the agent can write correct code on the first try without re-reading the docs.

Keep your API key away from the agent

Treat your Keeper API key like a production database password. The goal is that the model never sees the key's value, in any message, tool call, or file it reads.

  • Do not paste the key into the chat. Not in the system prompt, not in a user message, not in a .env file you show the model.
  • Set the key as an environment variable in the shell / container / CI runner where the agent's code-execution tool runs — e.g. export KEEPER_API_KEY=kh_live_... in a terminal the model does not have stdout access to, or as a secret in your CI/hosting provider.
  • The agent references the key by name, never by value. Generated code should read process.env.KEEPER_API_KEY / os.environ["KEEPER_API_KEY"] / $KEEPER_API_KEY — the literal string kh_live_... must never appear in code, logs, prompts, or commit history.
  • Do not echo, print, or log the key in any command the agent runs. If you need to check the key is set, check its length (echo ${#KEEPER_API_KEY}), not its value.
  • Do not commit .env files. Add .env to .gitignore before the agent writes any code.
  • Rotate immediately if a key ever appears in a chat transcript, a log, or a commit — assume it is compromised.

The prompt below is written so a well-behaved agent will naturally follow these rules. Keep it that way: don't edit the prompt to include the key literal "just this once."

The prompt

markdown
You are integrating with the Keeper Health API (https://api.keeperhealth.com). Keeper returns negotiated healthcare rates pulled from payer machine-readable files. The API is asynchronous: you submit a search, poll for completion, then download one or more Snappy-compressed Parquet shards from pre-signed URLs. Follow these rules exactly.

## Authentication and API key handling (READ FIRST)

The API key is a secret. You must never see or handle its value.

- The key lives in the environment variable `KEEPER_API_KEY`, set outside this conversation.
- In generated code, reference it by name only: `process.env.KEEPER_API_KEY`, `os.environ["KEEPER_API_KEY"]`, `$KEEPER_API_KEY` in shell, etc. The literal value (starts with `kh_live_` or `kh_test_`) must NEVER appear in code, comments, logs, commit messages, file contents, or any message you output.
- Do not run commands that print the key: no `echo $KEEPER_API_KEY`, no `printenv KEEPER_API_KEY`, no `curl -v` that dumps headers to stdout, no `console.log(headers)`, no writing the key to a file.
- Do not ask the user to paste the key into this chat. If the env var is missing, tell the user to set it in their shell/CI secrets and stop — do not work around it.
- Do not commit `.env` files. Ensure `.env` is in `.gitignore` before writing code that loads from it.
- The pre-signed URLs returned in `download_urls` when a job completes are the only URLs that do NOT take the Authorization header — they are already signed. Do not attach the bearer token to them.

Every request to `/v1/*` must include the header `Authorization: Bearer <KEEPER_API_KEY>`, where the value comes from the environment at runtime.

## Discovery endpoints (optional but recommended)

Before submitting a search, you can enumerate the exact `payer` and `plan` strings the API accepts. These are cheap, synchronous reads — no polling, no downloads. Cache the results; they change infrequently.

- `GET /v1/payers``[{ "payer_key": "cigna", "payer": "Cigna" }, ...]`. The `payer` field is what you pass to `fee_schedules[].payer`. The same `payer_key` may appear more than once (one dataset can host multiple payer brands) — treat `(payer_key, payer)` as the unit of identity.
- `GET /v1/payers/{payer_key}/plans?payer=<payer>``[{ "plan": "PPO" }, { "plan": "HMO" }, ...]`. The `plan` field is what you pass to `fee_schedules[].plan`. The `payer` query parameter is required because one `payer_key` may host multiple brands and a `plan` is only valid against its own payer.

Both require `Authorization: Bearer $KEEPER_API_KEY`. Errors use the same envelope described below (401/403/404/500).

```bash
curl https://api.keeperhealth.com/v1/payers \
  -H "Authorization: Bearer $KEEPER_API_KEY"

curl -s --get --data-urlencode "payer=Cigna" \
  https://api.keeperhealth.com/v1/payers/cigna/plans \
  -H "Authorization: Bearer $KEEPER_API_KEY"
```

Use the returned strings verbatim in `fee_schedules`. Do not mutate casing, strip whitespace, or invent synonyms — `"United Healthcare"` will not match `"UnitedHealthcare"`, and a mistyped plan silently returns zero rows.

## Workflow — three steps, in order

### Step 1: POST /v1/searches — submit the job

Request shape:

- Method: `POST`
- URL: `https://api.keeperhealth.com/v1/searches`
- Headers:
  - `Authorization: Bearer $KEEPER_API_KEY` (value sourced from env at runtime)
  - `Content-Type: application/json`
  - `Idempotency-Key: <stable-string>`  ← REQUIRED in production. Searches are billable and long-running; a retry without this key creates a duplicate job and duplicate cost. Use a deterministic key tied to the logical request (e.g. `"nightly-rates-2026-04-14"`), not a random UUID per retry. Same key within 24h returns the original `job_id`.
- Body fields:
  - `npis`: `integer[]`, required, 1–1000 ten-digit NPIs.
  - `billing_codes`: `string[]`, required, 1–100 CPT/HCPCS codes.
  - `fee_schedules`: `object[]`, required, 1–20 items of `{ "payer": string, "plan": string }`. Values are matched exactly — do not hardcode or guess them. If you don't already know the valid strings, call `GET /v1/payers` and `GET /v1/payers/{payer_key}/plans?payer=<payer>` first (see "Discovery endpoints" below) and pass the returned values verbatim.
  - `testing`: optional boolean. Leave `false` in production.

Example request (note the env var — never substitute the literal key):

```bash
curl -X POST https://api.keeperhealth.com/v1/searches \
  -H "Authorization: Bearer $KEEPER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: acme-weekly-rates-2026-04-14" \
  -d '{
    "npis": [1144218512, 1234567890],
    "billing_codes": ["99213", "99214"],
    "fee_schedules": [
      {"payer": "Cigna", "plan": "PPO"},
      {"payer": "UnitedHealthcare", "plan": "Choice Plus"}
    ],
    "testing": false
  }'
```

Expected response: `202 Accepted`

```json
{
  "job_id": "6f4245de-4b44-4bb6-aaae-f11c0d4f45c0",
  "status": "queued",
  "created_at": "2026-04-11T09:39:35Z",
  "status_url": "https://api.keeperhealth.com/v1/searches/6f4245de-4b44-4bb6-aaae-f11c0d4f45c0"
}
```

Store `job_id`. Do NOT re-POST on network failures — poll instead, or retry the POST with the SAME `Idempotency-Key`.

### Step 2: GET /v1/searches/{job_id} — poll for completion

Request shape:

- Method: `GET`
- URL: `https://api.keeperhealth.com/v1/searches/{job_id}`
- Headers:
  - `Authorization: Bearer $KEEPER_API_KEY`

Example request:

```bash
curl https://api.keeperhealth.com/v1/searches/6f4245de-4b44-4bb6-aaae-f11c0d4f45c0 \
  -H "Authorization: Bearer $KEEPER_API_KEY"
```

Example response while still running:

```json
{
  "job_id": "6f4245de-4b44-4bb6-aaae-f11c0d4f45c0",
  "status": "processing",
  "created_at": "2026-04-11T09:39:35Z"
}
```

Example response when finished:

```json
{
  "job_id": "6f4245de-4b44-4bb6-aaae-f11c0d4f45c0",
  "status": "completed",
  "created_at": "2026-04-11T09:39:35Z",
  "completed_at": "2026-04-11T09:41:02Z",
  "download_urls": [
    "https://storage.googleapis.com/...&X-Goog-Signature=...",
    "https://storage.googleapis.com/...&X-Goog-Signature=..."
  ],
  "expires_at": "2026-04-11T10:41:02Z"
}
```

`status` is one of: `queued`, `processing`, `completed`, `failed`.

Polling rules (follow these, do not invent your own):

- Poll every 5 seconds for the first minute, then every 15 seconds after that.
- Cap total wait at ~15 minutes. Most jobs finish in under 2 minutes; large ones can take longer.
- If `status` is `queued` or `processing`: wait the interval and poll again.
- If `status` is `completed`: iterate `download_urls` (pre-signed, all share `expires_at` ~1 hour out). Proceed to step 3.
- If `status` is `failed`: the response contains an `error` field. Surface it; do not retry automatically.
- If the GET itself returns `429`: back off per the `Retry-After` header.

### Step 3: Download every Parquet shard

`download_urls` is ALWAYS a JSON array — length 1 or more. BigQuery shards large result sets into multiple Parquet files on export (1 GB cap per shard). Every shard has the same schema; together they are the complete result. Iterate the array; never index `[0]` or assume a single file.

`GET` each URL with NO Authorization header — they are pre-signed.

```bash
i=0
jq -r '.download_urls[]' status.json | while read -r url; do
  curl -o "results-$i.parquet" "$url"
  i=$((i + 1))
done
```

Each body is a Snappy-compressed Parquet file (filename ends in `.parquet`). Snappy decompression is handled transparently by any modern Parquet reader — no manual decompression step.

If URLs have expired (past `expires_at`), re-poll the status endpoint to mint a fresh set. Do not cache or persist signed URLs.

## Parquet output schema

Every shard has this schema. Load the full result as a single dataset by passing the list of shard paths to your reader — `pd.read_parquet(paths)`, `duckdb.read_parquet(paths)`, `pyarrow.dataset.dataset(paths)`, etc.

- `npi` — provider NPI (int)
- `provider_name` — provider individual name
- `ein` — Employer Identification Number (int)
- `business_name` — provider legal business name
- `network_name` — in-network network name
- `payer_name` — payer name (matches `payer` from your `fee_schedules[]`)
- `plan_name` — fee-schedule plan name (slug form, e.g. `national-ppo`)
- `billing_code` — CPT/HCPCS code
- `code_category` — category for the billing code
- `billing_class``professional` or `institutional`
- `modifier` — CPT modifier, may be empty
- `rate` — negotiated rate (float, USD)
- `medicare_rate` — Medicare reference rate (float, may be NULL)
- `pct_of_medicare``rate / medicare_rate` as a percentage (may be NULL)
- `negotiated_type` — e.g. `negotiated`, `fee schedule`, `derived`
- `fee_schedule` — fee schedule name
- `service_group` — service group / category

Rows are deduplicated by EIN; each `(provider, plan, billing_code)` tuple appears once.

## Error envelope

All errors on `/v1/*` return JSON of the form:

```json
{ "error": { "type": "validation_error", "message": "...", "details": [...] } }
```

Handle these `error.type` values:

- `validation_error` (400) — a field failed validation. Fix the request body; do not retry unchanged.
- `invalid_request` (400) — malformed JSON or malformed `Idempotency-Key`. Fix and resend.
- `unauthorized` (401) — missing/bad API key. Stop and surface to the user. Do NOT print or ask for the key.
- `forbidden` (403) — key revoked or org inactive. Stop and surface to the user.
- `not_found` (404) — `job_id` does not exist or belongs to another org.
- `rate_limited` (429) — back off per `Retry-After`, then retry.
- `enqueue_failed` (503) — transient; retry with the SAME `Idempotency-Key` after `Retry-After`.
- `internal_error` (500) — transient; retry once, then surface to the user.

## Hard limits per request

- `npis`: ≤ 1000
- `billing_codes`: ≤ 100
- `fee_schedules`: ≤ 20

If the user asks for more, split into multiple searches (each with its own stable `Idempotency-Key`) and concatenate results after download.

## What "done" looks like

A correct implementation:

1. Reads `KEEPER_API_KEY` from the environment — never as a literal in code, prompts, or logs.
2. Resolves `payer` and `plan` strings from `GET /v1/payers` and `GET /v1/payers/{payer_key}/plans` (or a cached copy) rather than hardcoding guesses.
3. POSTs with `Idempotency-Key` set to a deterministic string for the logical request.
4. Polls `GET /v1/searches/{job_id}` on a 5s-then-15s cadence until `status` is `completed` or `failed`, capped at ~15 minutes.
5. Iterates `download_urls` (never indexed `[0]`), downloads each Parquet shard without the auth header, and loads them as a single dataset via its language's Parquet reader.
6. Handles every error `type` above — never silently swallows a failure, never retries a `validation_error`, never retries a POST without the same idempotency key.
7. Never prints, logs, commits, or otherwise surfaces the API key's value.

Write the integration in the language the user is working in. Prefer the standard HTTP client for that language. Do not invent a Keeper SDK — there isn't one; call the REST API directly.

How to use it

  1. Set KEEPER_API_KEY in the environment where your agent's code-execution tool runs (shell export, Docker env, CI secret, hosting provider secret manager). Do not paste the key into the chat.
  2. Add .env to .gitignore if your agent will create one.
  3. Copy everything inside the fenced block above (from You are integrating... through ...call the REST API directly.).
  4. Paste it into your agent's system prompt, a CLAUDE.md / .cursorrules / equivalent rules file, or the initial message you send the agent.
  5. Ask the agent to build the integration in whatever language and framework your project uses — it now has everything it needs, without ever seeing the key's value.

If you want a minimal reference implementation to compare the agent's output against, see the Quickstart and Code Examples pages.

Keeper Health API v1 · Questions? company@keeperhealth.com