● Databricks Connection

How we connect — as a machine, read-only, in place

The platform connects to Databricks the way the vendor recommends for applications: a machine-to-machine (M2M) service principal, not a personal login. It reads data where it lives and brings back only results.

The connection, step by step

Authenticate as a service principal

The app exchanges its client ID + secret for a short-lived access token (OAuth client-credentials). It logs in as stemline-datascience-sp — its own identity, not a person’s. The token is cached and auto-rotated.

Browse the catalog (metadata only)

To discover what data exists, it calls the Unity Catalog REST API (catalogs → schemas → tables). This needs no compute and reads no patient rows.

Run a governed, read-only query

For actual numbers, it submits a single SELECT to the SQL Statement Execution REST API on a SQL warehouse. Compute runs inside Databricks; only the aggregated result returns.

Pin the data version

For reproducibility, it records the exact data version each query read (commit version for tables, data-as-of timestamp for views).

Why this approach

🪪

Machine identity

Access is the app’s own, auditable identity — not tied to any employee, and revocable/rotatable independently.

🌐

Pure HTTPS REST

No database driver or native libraries — a smaller, simpler surface that’s easy to review and secure.

🏛️

Data stays in place

Queries execute in Databricks behind Unity Catalog. We retrieve results, not datasets — no bulk export, no PHI egress.

What the connection can — and cannot — do

✓ Can

  • Read the specific catalogs it is granted
  • List tables and columns (metadata)
  • Run single, read-only SELECT queries
  • Return aggregated / row-limited results

✗ Cannot

  • Insert, update, delete, or alter anything
  • Reach data outside its granted catalogs
  • Export bulk datasets or raw tables wholesale
  • Run as, or on behalf of, an individual user
Net effect

The platform is a least-privilege, read-only guest in Databricks. Even a bug or a misbehaving prompt cannot change clinical data.