The platform connects to Databricks the way the vendor recommends for applications: a machine-to-machine (M2M) service principal, not a personal login. It reads data where it lives and brings back only results.
The app exchanges its client ID + secret for a short-lived access token (OAuth client-credentials). It logs in as stemline-datascience-sp — its own identity, not a person’s. The token is cached and auto-rotated.
To discover what data exists, it calls the Unity Catalog REST API (catalogs → schemas → tables). This needs no compute and reads no patient rows.
For actual numbers, it submits a single SELECT to the SQL Statement Execution REST API on a SQL warehouse. Compute runs inside Databricks; only the aggregated result returns.
For reproducibility, it records the exact data version each query read (commit version for tables, data-as-of timestamp for views).
Access is the app’s own, auditable identity — not tied to any employee, and revocable/rotatable independently.
No database driver or native libraries — a smaller, simpler surface that’s easy to review and secure.
Queries execute in Databricks behind Unity Catalog. We retrieve results, not datasets — no bulk export, no PHI egress.
SELECT queriesThe platform is a least-privilege, read-only guest in Databricks. Even a bug or a misbehaving prompt cannot change clinical data.