The Open Source Lakehouse for the AI Era
One Control Plane. Every Catalog.
Every Client. Human or Agent.
Ship Iceberg REST on any cloud and any existing Hive. Federate Hudi, Delta, Lance, and filesets. One authorization path for every client, human or agent. Built on Apache Gravitino.


Trusted by
The Problem
The Lakehouse Without a Catalog
Your data is scattered across formats like Iceberg, every cloud, and now vectors and models, each with its own catalog, governance, and credentials. Every AI agent you ship is one more identity these systems weren’t designed to govern.
Catalog sprawl across clouds and formats
Hive Metastore in one region, Glue in another, Polaris in a third. Lance datasets and ML models tracked in spreadsheets, if at all. No shared namespace, no shared identity, no consistent audit.
Governance fragmented per engine
Trino enforces one row-filter syntax, Spark another, BigQuery a third. Tabular data has rules. Files, vectors, and model artifacts often have none. Policies drift, audits diverge.
Credentials baked into pipelines and handed to agents
Long-lived keys hardcoded into jobs. Service accounts shared across teams. AI agents handed broad credentials because nobody’s wired up principal-scoped tokens yet. No clean revocation path when anything goes away.

The Solution
The Catalog of Catalogs for the Open Iceberg Lakehouse Era
Datastrato is built on Apache Gravitino, an Apache Top-Level Project and the federated metadata catalog for modern data and AI workloads. One namespace for everything your engines query and everything your AI agents need to find, understand, and use safely.
Iceberg-native, every engine
Standards-compliant Iceberg REST Catalog service, in-tree, not a proprietary fork. Spark, Flink, Trino, Dremio, and BigQuery all hit one catalog with one FQTN namespace. Iceberg is the open standard your stack already speaks.
Federate, don't migrate, across every format
Register existing Hive Metastore, Glue, Polaris, or remote Gravitino catalogs without copying metadata. The same federation model covers Lance datasets, vector indexes, feature stores, and ML model registries. Decisions stay with the data owner.
Governed access for humans and AI agents
Row filtering and column masking through the Iceberg spec. Tag-based policies travel with the data. Short-lived credentials get vended per principal, whether that's an analyst, a batch job, or an AI agent calling through ADP.

How It Works
One Catalog.
Every Engine.
Every Format.
Every Engine.
Every Format.
Iceberg REST Catalog
Standards-compliant IRC service. Hierarchical namespaces and three-level FQTN (catalog.namespace.table). Pluggable JDBC backend on PostgreSQL or MySQL for low-latency, HA-ready persistence.
Multimodal metadata: tables, files, vectors, models
The same catalog covers Iceberg tables, files, Lance datasets, vector indexes, feature stores, and ML models. Tables for analytics. Lance for multimodal training. Vectors for retrieval. Models for serving.
Multi-cloud, multi-format storage
Native S3, GCS, and ADLS support with per-table backend dispatch via MultiSchemeFileIO. Same catalog across cloud and on-prem object stores. No engine changes when storage moves.
Catalog federation, no metadata copy
IRC-to-IRC registration across clouds and on-prem. No replication lag, no second source of truth. Each remote catalog keeps its own IAM, RBAC, vending, and audit log.
Fine-grained access control
Row filtering and column masking through the Iceberg spec (PR #13879). Tag-based policies attach to data, not tables, so classification travels with the asset. Identical evaluation on every compliant engine.
Credential vending for humans and agents
Short-lived credentials minted per principal and per asset, native on AWS, GCP, and Azure. Analysts, batch jobs, and AI agents all get capability-scoped tokens through the same path. Same audit trail, same revocation, no long-lived keys baked into pipelines.

Trustworthiness
The Lakehouse Your Auditors Will Sign Off On.
Regulated multi-cloud platforms, multimodal AI on the lakehouse, AI agents acting on production data — Datastrato gives you one catalog and one governance model across all of it.
Regulated, multi-cloud lakehouse
Govern PII and PHI across S3, GCS, and ADLS, with audit events streamed to Splunk, Datadog, or Sentinel. Identical enforcement on Spark, Trino, Flink, Dremio, and BigQuery.
Multimodal lakehouse for AI and ML
Tables, files, Lance datasets, vector indexes, feature stores, and ML models in one namespace. Training jobs find the right version, and lineage runs from raw file to deployed model.
AI agent data access at scale
Vend capability-scoped, just-in-time credentials to agents through ADP. They discover what's available, ask what they're allowed to use, and work inside the same policy and audit framework as everyone else.
Hybrid lakehouse migration and consolidation
Federate Hive Metastore, Glue, and Polaris under one namespace, before or instead of migrating. Cross-catalog CTAS and zero-copy register-table for promotion. Same Gravitino on OpenShift, EKS, GKE, and AKS, with air-gap support.
Why Datastrato
Open Foundation.
Enterprise Distribution.
Enterprise Distribution.
Apache Gravitino is the open metadata lake. Datastrato Enterprise is what you deploy in production: hardened, certified, supported, and compliance-ready, with the open core unchanged.
Apache Top-Level Project
Apache 2.0 license, vendor-neutral governance, community-driven roadmap. No license-change risk, no proprietary fork.
Hardened, certified distribution
Signed container images and a full-stack Helm chart pinned per release. Deploys as one unit on EKS, GKE, AKS, OpenShift, or Kubernetes 1.28+.
Enterprise identity and audit
LDAP, Active Directory, and SCIM for people; federated identity with delegation chains for AI agents. SIEM streaming plus KMS and HashiCorp Vault integration.
Compliance posture
SOC 2 Type II, ISO 27001 readiness, HIPAA BAA, and GDPR alignment. NDA-gated access to audit reports and control documentation.
Standard and Premium support
Dedicated Slack Connect channel, 1-hour P0 response (24/7 on Premium), a named Technical Account Manager, and extended LTS windows beyond the standard 18 months.

