● Architecture

How it’s built — and why every layer is governed

A mainstream, auditable technology stack. Data is read in place from Databricks; the AI sits at the top as an interface, never between you and the raw data. Read the diagram bottom-up: data stays put, and only governed, read-only requests flow down to it.

User Interface Layer 6 · what people see

Workspace tabs

Overview · Schema · Agent · Cohorts · Feasibility · Provenance

Plain-English chat

Ask, get tables & charts back

Review controls

Validate / needs-changes / reject

▲ results, charts, status

AI Analyst Layer Layer 5 · enterprise-licensed

Reasoning & formatting

Turns questions → governed tool calls; narrates results with caveats

Enterprise LLM API

Licensed, contractual; no training on our data

No agent frameworkWe own the loopRead-only tools only

▼ governed tool calls only

Application & Governance Layer 4 · our code

Tool registry

Every action wrapped in policy + audit

Read-only SQL guard

SELECT-only · DML/DDL blocked · row-capped

Cohort · Feasibility · Review

Versioned, validated, de-duplicated

App metadata (Postgres)

Chats, cohorts, provenance, audit

▼ M2M token · read-only REST

Secure Connection Layer 3 · machine-to-machine

Service principal (OAuth M2M)

App authenticates as itself; rotatable secret

SQL Statement Execution API

Read-only queries over HTTPS — no driver

Unity Catalog REST

Metadata browse, no warehouse needed

▼ least-privilege grants

Databricks Platform Layer 2 · compute & governance

SQL Warehouse

Compute runs here — data never leaves to run

Unity Catalog

Central access control, lineage, audit

▼ governed catalogs

Governed Data Assets Layer 1 · the data

🧬 Flatiron

Real-world clinical (de-identified)

🩸 Guardant

Genomic / ctDNA

📊 Norstella

Market access

The technology stack

All mainstream, supported, auditable components — nothing exotic, nothing autonomous.

Application

Next.js (React, TypeScript)Tailwind CSS Prisma + PostgreSQLNextAuth (roles)

A standard web application. PostgreSQL stores only platform metadata (chats, cohorts, logs) — not patient data.

Data connection

Databricks OAuth (M2M)SQL Statement Execution REST Unity Catalog RESTDelta / materialized views

Pure HTTPS REST APIs — no database driver, no native dependencies, easy to audit and lock down.

AI

Enterprise-licensed LLM APIVercel AI SDK (thin client)

No LangChain, no AutoGPT, no autonomous-agent framework. The “agent loop” is our own code — every tool call is validated and logged by us.

What we deliberately avoided

No bulk data copy out of Databricks
No write access to clinical data
No opaque AI framework making its own calls
No AI training/fine-tuning on patient data

How we connect to Databricks →

Jump to

What the AI actually does →