● Architecture

How it’s built — and why every layer is governed

A mainstream, auditable technology stack. Data is read in place from Databricks; the AI sits at the top as an interface, never between you and the raw data. Read the diagram bottom-up: data stays put, and only governed, read-only requests flow down to it.

User Interface Layer 6 · what people see
Workspace tabs
Overview · Schema · Agent · Cohorts · Feasibility · Provenance
Plain-English chat
Ask, get tables & charts back
Review controls
Validate / needs-changes / reject
▲ results, charts, status
AI Analyst Layer Layer 5 · enterprise-licensed
Reasoning & formatting
Turns questions → governed tool calls; narrates results with caveats
Enterprise LLM API
Licensed, contractual; no training on our data
No agent frameworkWe own the loopRead-only tools only
▼ governed tool calls only
Application & Governance Layer 4 · our code
Tool registry
Every action wrapped in policy + audit
Read-only SQL guard
SELECT-only · DML/DDL blocked · row-capped
Cohort · Feasibility · Review
Versioned, validated, de-duplicated
App metadata (Postgres)
Chats, cohorts, provenance, audit
▼ M2M token · read-only REST
Secure Connection Layer 3 · machine-to-machine
Service principal (OAuth M2M)
App authenticates as itself; rotatable secret
SQL Statement Execution API
Read-only queries over HTTPS — no driver
Unity Catalog REST
Metadata browse, no warehouse needed
▼ least-privilege grants
Databricks Platform Layer 2 · compute & governance
SQL Warehouse
Compute runs here — data never leaves to run
Unity Catalog
Central access control, lineage, audit
▼ governed catalogs
Governed Data Assets Layer 1 · the data
🧬 Flatiron
Real-world clinical (de-identified)
🩸 Guardant
Genomic / ctDNA
📊 Norstella
Market access

The technology stack

All mainstream, supported, auditable components — nothing exotic, nothing autonomous.

Application

Next.js (React, TypeScript)Tailwind CSS Prisma + PostgreSQLNextAuth (roles)

A standard web application. PostgreSQL stores only platform metadata (chats, cohorts, logs) — not patient data.

Data connection

Databricks OAuth (M2M)SQL Statement Execution REST Unity Catalog RESTDelta / materialized views

Pure HTTPS REST APIs — no database driver, no native dependencies, easy to audit and lock down.

AI

Enterprise-licensed LLM APIVercel AI SDK (thin client)

No LangChain, no AutoGPT, no autonomous-agent framework. The “agent loop” is our own code — every tool call is validated and logged by us.

What we deliberately avoided

  • No bulk data copy out of Databricks
  • No write access to clinical data
  • No opaque AI framework making its own calls
  • No AI training/fine-tuning on patient data