Reference Architecture · Whitepaper

AI data governance breaks when visibility stops at storage.

This whitepaper explains the specialist AI governance story inside PumaMesh: understand sensitive data, track how it moves, keep policy attached, and produce evidence after it reaches models, tools, and people.

It is a deeper technical resource, but it should still read from the same buyer promise: one platform keeps movement, lineage, and audit connected.

Problem

Static inventories cannot explain what AI pipelines actually used.

Traditional posture tools are useful for finding sensitive data at rest. AI pipelines create a harder problem: data moves into indexes, fine-tunes, prompts, tools, and outputs faster than periodic scans can explain.

Scan Cadence

Weekly scans cannot track second-scale data flow

Scheduled classification jobs were built for data at rest. AI pipelines create new artifacts — embeddings, fine-tunes, responses — faster than scanners can find them.

Platform Boundaries

Per-platform guardrails do not federate

Bedrock Guardrails, Foundry Content Safety, Vertex Safety Filters, and Databricks AI Gateway each govern their own platform. None see upstream or downstream.

Lineage Gap

No system tracks records from source row to agent action

Warehouses end at the export. AI platforms start at the prompt. The gap — transfers, feature stores, fine-tune sets — is where lineage disappears.

Regulatory Pressure

EU AI Act and NIST AI RMF demand evidence DSPM does not produce

Article 12 activity logs, NIST Measure and Manage functions, and model-risk reviews require artifacts per-platform guardrails can't export on their own.

Control Surfaces

Six places where lineage can disappear.

A credible DSPM for AI architecture covers source posture, movement, training, retrieval, tool-calls, and evidence. Miss one and the chain breaks.

1. Source Data Posture

Classification and sensitivity of records before they enter AI pipelines

Traditional DSPM territory — extended so classifications are machine-readable downstream, not just visible in a dashboard.

2. Transfer Lineage

What moved, from where, to where, under which policy

The gap between warehouse and AI platform. Transfers must carry classification as first-class metadata, not opaque payloads.

3. Training and Fine-Tune Provenance

Which sensitive records entered which model artifact

Fine-tune sets, embedding indexes, and LoRA adapters inherit the sensitivity of their source data. Provenance has to follow.

4. Retrieval and Prompt Lineage

Which records were retrieved, embedded in context, or returned in a response

RAG systems pull thousands of rows per prompt. Lineage has to tie each retrieval back to source-row sensitivity — not just a vector ID.

5. Agent Tool-Call Policy

Which tools agents are allowed to invoke with which data

Agent frameworks hand out tool access broadly. DSPM for AI has to enforce ABAC on tool-calls the same way it enforces it on files.

6. Evidence and Audit

Exportable artifacts aligned to regulatory frameworks

EU AI Act Article 12, NIST AI RMF Measure/Manage, ISO/IEC 42001, and internal model-risk review all require artifacts the platforms don't produce on their own.

Reference Architecture

A neutral federated analytics plane across AI platforms

The architecture has three planes. A data plane holds records, transfers, and model artifacts on the platforms where they already live. A control plane enforces policy at transfer and tool-call boundaries. A federated analytics plane sits across all platforms and produces one lineage view.

Data Plane

Platform-native storage and compute

Source warehouses (Snowflake, BigQuery, Redshift), feature stores, object storage, AI platforms (Bedrock, Foundry, Vertex, Databricks, Snowflake Cortex), and agent runtimes. This plane stays put — the architecture never centralizes data.

Control Plane

Policy enforcement at transfer, retrieval, and tool-call boundaries

An ABAC policy engine runs next to transfer execution and AI gateway proxies. Policies target data attributes, not platform primitives — so one rule covers a file transfer, a RAG retrieval, and an agent tool-call.

Federated Analytics Plane

One lineage view across platforms

A neutral layer that takes in posture, transfer, training, retrieval, and tool-call events — and produces one lineage, posture, and evidence view. This is the DSPM for AI product surface.

Non-Goals

What the architecture explicitly does not do

No data centralization. No replacement for platform-native guardrails. No forced standardization on one AI platform. Federation, not consolidation.

Control Plane Detail

ABAC on attributes, not primitives

The control plane writes policy against data attributes — classification, marking, owner, sensitivity, jurisdiction. Not bucket names, Bedrock guardrail IDs, or Databricks Unity Catalog grants. One ABAC rule enforces at every boundary.

Transfer Boundary

Records tagged PHI cannot leave jurisdiction US-HEALTHCARE

The transfer executor reads the attribute before starting movement. No DLP inspection needed — the classification is already on the file.

Retrieval Boundary

Records tagged PHI cannot enter a non-US-hosted model context

The RAG retrieval proxy filters vector results by attribute before the context window is assembled.

Tool-Call Boundary

Agents operating under role X cannot invoke tools that write to jurisdiction Y

The agent gateway proxy checks each tool-call against the agent role and the target resource's attributes.

Training Boundary

Records tagged restricted cannot enter a fine-tune set outside a sovereignty zone

The training pipeline's ingestion proxy reads record attributes before adding them to the fine-tune manifest.

Evidence and Audit

Artifacts regulators now ask for by name

The federated analytics plane exports evidence packs for the frameworks auditors, cyber insurers, and model-risk teams now require. Each pack is built from events the control plane already emits — not generated after the fact.

EU AI Act · Article 12

Automatic activity logs over the lifetime of each high-risk AI system — inputs, events, and outputs, all traceable.

NIST AI RMF · Measure & Manage

Artifacts for Measure (MS-1 to MS-4) and Manage (MG-1 to MG-4) functions, with mapped control evidence per model.

ISO/IEC 42001

AI management system evidence — risk assessment inputs, control logs, and continuous-monitoring output.

Internal Model Risk Review

Model-card inputs covering training data provenance, sensitive-data exposure, and retrieval-path inventory.

Cyber Insurance

AI data surface inventory, sensitive-data flow map, and incident-response artifacts underwriters now want at renewal.

CMMC v1, v2, v3 & FedRAMP-Aligned Controls

All 110 CMMC controls for data sharing met by the product. FedRAMP-aligned (80+ NIST SP 800-53 Rev 5). Federal and defense deployments pull AI workloads inside the accreditation boundary with inherited evidence.

The Capability Model

How the reference architecture maps to Protect, Understand, Move, & Accelerate

DSPM for AI is not a new product category — it is the Understand pillar extended into AI pipelines, with Protect, Move, and Accelerate running alongside.

P

Protect AI traffic

Training sets, weights, and inference traffic encrypted at rest and in flight. 100% post-quantum. Relay never decrypts.

U

Understand the data

Content inspection, compliance framework matching, customer ontology matching, ABAC access gating — across transfers, RAG, fine-tunes, tool-calls.

M

Move AI workloads

Line-rate model delivery (70B in <60s), federated learning across sovereignty zones, Windows + Linux native.

A

Accelerate evidence

EU AI Act Article 12, NIST AI RMF, ISO 42001, CMMC v1/v2/v3 — evidence packs built from the audit stream continuously.

Implementation with PumaMesh

How Pulse and the fabric implement the reference architecture

PumaMesh is the reference architecture in product form. The fabric is the control plane — transfer, classification, and policy all sit in-path. Pulse is the federated analytics plane — posture, lineage, and evidence across every node and every AI platform. Pulse is the Understand pillar, delivered.

Control Plane

Fabric + Shield + Transit

ABAC checked at every transfer. Classification attached to records inline. Quantum-safe crypto posture held across sovereignty zones.

Federated Analytics

Pulse

Eleven views cover posture, discovery, UEBA, legal hold, audit, and AI Insights. Federated queries reach every node — no central collector.

AI Platform Coverage

Bedrock, Foundry, Vertex, Databricks, Snowflake Cortex

Gateway proxies for each platform capture retrieval, prompt, and tool-call events and feed them into the Pulse lineage graph.

Evidence Packs

Audit stream → framework-aligned artifacts

EU AI Act, NIST AI RMF, ISO/IEC 42001, CMMC v1/v2/v3 (all 110 controls for data sharing), and FedRAMP-aligned control packs — all built from the audit stream the fabric emits continuously.