Skip to main content
Version: 0.5.0

Hadoop catalog

Introduction

Hadoop catalog is a fileset catalog that using Hadoop Compatible File System (HCFS) to manage the storage location of the fileset. Currently, it supports local filesystem and HDFS. For object storage like S3, GCS, and Azure Blob Storage, you can put the hadoop object store jar like hadoop-aws into the $GRAVITINO_HOME/catalogs/hadoop/libs directory to enable the support. Gravitino itself hasn't yet tested the object storage support, so if you have any issue, please create an issue.

Note that Gravitino uses Hadoop 3 dependencies to build Hadoop catalog. Theoretically, it should be compatible with both Hadoop 2.x and 3.x, since Gravitino doesn't leverage any new features in Hadoop 3. If there's any compatibility issue, please create an issue.

Catalog

Catalog properties

Property NameDescriptionDefault ValueRequiredSince Version
locationThe storage location managed by Hadoop catalog.(none)No0.5.0

Catalog operations

Refer to Catalog operations for more details.

Schema

Schema capabilities

The Hadoop catalog supports creating, updating, deleting, and listing schema.

Schema properties

Property nameDescriptionDefault valueRequiredSince Version
locationThe storage location managed by Hadoop schema.(none)No0.5.0

Schema operations

Refer to Schema operation for more details.

Fileset

Fileset capabilities

  • The Hadoop catalog supports creating, updating, deleting, and listing filesets.

Fileset properties

None.

Fileset operations

Refer to Fileset operations for more details.