Scan Cadence
Weekly scans cannot track second-scale data flowScheduled classification jobs were built for data at rest. AI pipelines create new artifacts — embeddings, fine-tunes, responses — faster than scanners can find them.
This whitepaper explains the specialist AI governance story inside PumaMesh: understand sensitive data, track how it moves, keep policy attached, and produce evidence after it reaches models, tools, and people.
It is a deeper technical resource, but it should still read from the same buyer promise: one platform keeps movement, lineage, and audit connected.
Traditional posture tools are useful for finding sensitive data at rest. AI pipelines create a harder problem: data moves into indexes, fine-tunes, prompts, tools, and outputs faster than periodic scans can explain.
Scan Cadence
Weekly scans cannot track second-scale data flowScheduled classification jobs were built for data at rest. AI pipelines create new artifacts — embeddings, fine-tunes, responses — faster than scanners can find them.
Platform Boundaries
Per-platform guardrails do not federateBedrock Guardrails, Foundry Content Safety, Vertex Safety Filters, and Databricks AI Gateway each govern their own platform. None see upstream or downstream.
Lineage Gap
No system tracks records from source row to agent actionWarehouses end at the export. AI platforms start at the prompt. The gap — transfers, feature stores, fine-tune sets — is where lineage disappears.
Regulatory Pressure
EU AI Act and NIST AI RMF demand evidence DSPM does not produceArticle 12 activity logs, NIST Measure and Manage functions, and model-risk reviews require artifacts per-platform guardrails can't export on their own.
A credible DSPM for AI architecture covers source posture, movement, training, retrieval, tool-calls, and evidence. Miss one and the chain breaks.
1. Source Data Posture
Classification and sensitivity of records before they enter AI pipelinesTraditional DSPM territory — extended so classifications are machine-readable downstream, not just visible in a dashboard.
2. Transfer Lineage
What moved, from where, to where, under which policyThe gap between warehouse and AI platform. Transfers must carry classification as first-class metadata, not opaque payloads.
3. Training and Fine-Tune Provenance
Which sensitive records entered which model artifactFine-tune sets, embedding indexes, and LoRA adapters inherit the sensitivity of their source data. Provenance has to follow.
4. Retrieval and Prompt Lineage
Which records were retrieved, embedded in context, or returned in a responseRAG systems pull thousands of rows per prompt. Lineage has to tie each retrieval back to source-row sensitivity — not just a vector ID.
5. Agent Tool-Call Policy
Which tools agents are allowed to invoke with which dataAgent frameworks hand out tool access broadly. DSPM for AI has to enforce ABAC on tool-calls the same way it enforces it on files.
6. Evidence and Audit
Exportable artifacts aligned to regulatory frameworksEU AI Act Article 12, NIST AI RMF Measure/Manage, ISO/IEC 42001, and internal model-risk review all require artifacts the platforms don't produce on their own.
The architecture has three planes. A data plane holds records, transfers, and model artifacts on the platforms where they already live. A control plane enforces policy at transfer and tool-call boundaries. A federated analytics plane sits across all platforms and produces one lineage view.
Data Plane
Platform-native storage and computeSource warehouses (Snowflake, BigQuery, Redshift), feature stores, object storage, AI platforms (Bedrock, Foundry, Vertex, Databricks, Snowflake Cortex), and agent runtimes. This plane stays put — the architecture never centralizes data.
Control Plane
Policy enforcement at transfer, retrieval, and tool-call boundariesAn ABAC policy engine runs next to transfer execution and AI gateway proxies. Policies target data attributes, not platform primitives — so one rule covers a file transfer, a RAG retrieval, and an agent tool-call.
Federated Analytics Plane
One lineage view across platformsA neutral layer that takes in posture, transfer, training, retrieval, and tool-call events — and produces one lineage, posture, and evidence view. This is the DSPM for AI product surface.
Non-Goals
What the architecture explicitly does not doNo data centralization. No replacement for platform-native guardrails. No forced standardization on one AI platform. Federation, not consolidation.
The control plane writes policy against data attributes — classification, marking, owner, sensitivity, jurisdiction. Not bucket names, Bedrock guardrail IDs, or Databricks Unity Catalog grants. One ABAC rule enforces at every boundary.
Transfer Boundary
Records tagged PHI cannot leave jurisdiction US-HEALTHCAREThe transfer executor reads the attribute before starting movement. No DLP inspection needed — the classification is already on the file.
Retrieval Boundary
Records tagged PHI cannot enter a non-US-hosted model contextThe RAG retrieval proxy filters vector results by attribute before the context window is assembled.
Tool-Call Boundary
Agents operating under role X cannot invoke tools that write to jurisdiction YThe agent gateway proxy checks each tool-call against the agent role and the target resource's attributes.
Training Boundary
Records tagged restricted cannot enter a fine-tune set outside a sovereignty zoneThe training pipeline's ingestion proxy reads record attributes before adding them to the fine-tune manifest.
The federated analytics plane exports evidence packs for the frameworks auditors, cyber insurers, and model-risk teams now require. Each pack is built from events the control plane already emits — not generated after the fact.
Automatic activity logs over the lifetime of each high-risk AI system — inputs, events, and outputs, all traceable.
Artifacts for Measure (MS-1 to MS-4) and Manage (MG-1 to MG-4) functions, with mapped control evidence per model.
AI management system evidence — risk assessment inputs, control logs, and continuous-monitoring output.
Model-card inputs covering training data provenance, sensitive-data exposure, and retrieval-path inventory.
AI data surface inventory, sensitive-data flow map, and incident-response artifacts underwriters now want at renewal.
All 110 CMMC controls for data sharing met by the product. FedRAMP-aligned (80+ NIST SP 800-53 Rev 5). Federal and defense deployments pull AI workloads inside the accreditation boundary with inherited evidence.
DSPM for AI is not a new product category — it is the Understand pillar extended into AI pipelines, with Protect, Move, and Accelerate running alongside.
Training sets, weights, and inference traffic encrypted at rest and in flight. 100% post-quantum. Relay never decrypts.
Content inspection, compliance framework matching, customer ontology matching, ABAC access gating — across transfers, RAG, fine-tunes, tool-calls.
Line-rate model delivery (70B in <60s), federated learning across sovereignty zones, Windows + Linux native.
EU AI Act Article 12, NIST AI RMF, ISO 42001, CMMC v1/v2/v3 — evidence packs built from the audit stream continuously.
PumaMesh is the reference architecture in product form. The fabric is the control plane — transfer, classification, and policy all sit in-path. Pulse is the federated analytics plane — posture, lineage, and evidence across every node and every AI platform. Pulse is the Understand pillar, delivered.
Control Plane
Fabric + Shield + TransitABAC checked at every transfer. Classification attached to records inline. Quantum-safe crypto posture held across sovereignty zones.
Federated Analytics
PulseEleven views cover posture, discovery, UEBA, legal hold, audit, and AI Insights. Federated queries reach every node — no central collector.
AI Platform Coverage
Bedrock, Foundry, Vertex, Databricks, Snowflake CortexGateway proxies for each platform capture retrieval, prompt, and tool-call events and feed them into the Pulse lineage graph.
Evidence Packs
Audit stream → framework-aligned artifactsEU AI Act, NIST AI RMF, ISO/IEC 42001, CMMC v1/v2/v3 (all 110 controls for data sharing), and FedRAMP-aligned control packs — all built from the audit stream the fabric emits continuously.