ARCHITECTURE2026-04-104 min read

Zero-copy vs ETL: why we don't move your data

Every data integration tool you've used works the same way: extract data from a source, transform it, load it into a warehouse or vector store. ETL. It's been the dominant paradigm for 30 years.

For AI agent infrastructure, we think it's the wrong one. Here's why.

ETL introduces three problems that are uniquely painful for AI agents. First, latency. Batch ETL pipelines run on schedules — hourly, daily, sometimes weekly. An AI agent making a real-time decision on data that's 4 hours stale is making a decision on yesterday's truth. In fast-moving environments (sales, support, operations), "4 hours ago" might as well be "before the world changed."

Second, duplication. Every copy of your data is a copy of your risk surface. Sensitive customer records now live in the source system AND in the warehouse AND in the vector store AND in whatever cache the agent framework maintains. Each copy needs its own access controls, its own encryption, its own retention policies, its own audit trail. Most organizations struggle to govern one copy. Governing four is a compliance exercise that never ends.

Third, staleness detection. When the agent retrieves a fact from the vector store, how does it know whether that fact is still true in the source system? The answer, in most ETL architectures, is: it doesn't. The fact was true when it was extracted. Whether it's still true now is unknowable without going back to the source — which defeats the purpose of having extracted it in the first place.

mantle takes a different approach. We call it zero-copy: connectors read the source of truth in place, at query time. There is no extraction step, no transformation step, no loading step. The data stays where it lives. The query path walks through the source systems on demand, and the results stream directly from source to agent.

This means: no staleness (you're always reading the current state), no duplication (there's only one copy — the source), and no latency beyond the query itself (which we're designing to minimise by caching metadata, not data).

The trade-off is real: zero-copy queries are bounded by the performance of the source systems. If your Salesforce instance is slow, your mantle query through Salesforce is slow. We mitigate this with metadata caching (we cache the knowledge graph structure and entity resolution mappings, not the underlying data) and with quality scoring (if a source is unreachable, we return what we have with a lower confidence score rather than failing the entire query).

We think this trade-off is worth it for AI agent use cases specifically, because the cost of acting on stale data is almost always higher than the cost of a slightly slower query. An agent that takes a fraction of a second longer to return a correct answer is better than one that's fast but stale.

We're building this at mantleai.dev. Zero-copy connectors, entity resolution, quality-scored context via MCP. Pre-release, shipping one connector at a time.