Skip to main content
Version: 0.3.1

Iceberg catalog

Introduction

Gravitino provides the ability to manage Apache Iceberg metadata.

Requirements and limitations

info

Builds with Apache Iceberg 1.3.1. The Apache Iceberg table format version is 1 by default. Builds with Hadoop 2.10.x, there may be compatibility issues when accessing Hadoop 3.x clusters.

Catalog

Catalog capabilities

  • Works as a catalog proxy, supporting HiveCatalog and JdbcCatalog.
  • Supports DDL operations for Iceberg schemas and tables.
  • Doesn't support snapshot or table management operations.

Catalog properties

Property nameDescriptionDefault valueRequiredSince Version
catalog-backendCatalog backend of Gravitino Iceberg catalog. Supports hive or jdbc.(none)Yes0.2.0
uriThe URI configuration of the Iceberg catalog. thrift://127.0.0.1:9083 or jdbc:postgresql://127.0.0.1:5432/db_name or jdbc:mysql://127.0.0.1:3306/metastore_db.(none)Yes0.2.0
warehouseWarehouse directory of catalog. file:///user/hive/warehouse-hive/ for local fs or hdfs://namespace/hdfs/path for HDFS.(none)Yes0.2.0

Any properties not defined by Gravitino with gravitino.bypass prefix will pass to Iceberg catalog properties and HDFS configuration. For example, if specify gravitino.bypass.list-all-tables, list-all-tables will pass to Iceberg catalog properties.

JDBC catalog

If you are using JDBC catalog, you must provide jdbc-user, jdbc-password and jdbc-driver to catalog properties.

Property nameDescriptionDefault valueRequiredSince Version
jdbc-userJDBC user name(none)Yes0.2.0
jdbc-passwordJDBC password(none)Yes0.2.0
jdbc-drivercom.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver for MySQL, org.postgresql.Driver for PostgreSQL(none)Yes0.3.0
jdbc-initializeWhether to initialize meta tables when create JDBC catalogtrueNo0.2.0
caution

You must download the corresponding JDBC driver to the catalogs/lakehouse-iceberg/libs directory.

Catalog operations

Please refer to Manage Metadata Using Gravitino for more details.

Schema

Schema capabilities

  • doesn't support cascade drop schema.

Schema properties

You could put properties except comment.

Schema operations

Please refer to Manage Metadata Using Gravitino for more details.

Table

Table capabilities

Table partitions

Supports transforms:

  • IdentityTransform
  • BucketTransform
  • TruncateTransform
  • YearTransform
  • MonthTransform
  • DayTransform
  • HourTransform
info

Iceberg doesn't support multi fields in BucketTransform. Iceberg doesn't support ApplyTransform, RangeTransform, and ListTransform.

Table sort orders

supports expressions:

  • FieldReference
  • FunctionExpression
    • bucket
    • truncate
    • year
    • month
    • day
    • hour
info

For bucket and truncate, the first argument must be integer literal, and the second argument must be field reference.

Table distributions

  • Doesn't support Distribution, you should use BucketPartition instead.
info

If you load Iceberg tables, the table distribution strategy is hash with num 0, which means no distribution.

Table column types

Gravitino TypeApache Iceberg Type
StructStruct
MapMap
ArrayArray
BooleanBoolean
IntegerInteger
LongLong
FloatFloat
DoubleDouble
StringString
DateDate
TimeTime
TimestampType withZoneTimestampType withZone
TimestampType withoutZoneTimestampType withoutZone
DecimalDecimal
FixedFixed
BinaryTypeBinary
UUIDUUID
info

Apache Iceberg doesn't support Gravitino Varchar Fixedchar Byte Short Union type.

Table properties

You can pass Iceberg table properties to Gravitino when creating an Iceberg table.

The Gravitino server doesn't allow passing the following reserved fields.

Configuration itemDescription
commentThe table comment.
creatorThe table creator.
locationIceberg location for table storage.
current-snapshot-idThe snapshot represents the current state of the table.
cherry-pick-snapshot-idSelecting a specific snapshot in a merge operation.
sort-orderSelecting a specific snapshot in a merge operation.
identifier-fieldsThe identifier fields for defining the table.

Table operations

Please refer to Manage Metadata Using Gravitino for more details.

Alter table operations

Supports operations:

  • RenameTable
  • SetProperty
  • RemoveProperty
  • UpdateComment
  • AddColumn
  • DeleteColumn
  • RenameColumn
  • UpdateColumnType
  • UpdateColumnPosition
  • UpdateColumnNullability
  • UpdateColumnComment
info

The default column position is LAST when you add a column. If you add a non nullability column, there may be compatibility issues. Iceberg just supports updating primitive types.

caution

If you update a nullability column to non nullability, there may be compatibility issues.

HDFS configuration

You can place core-site.xml and hdfs-site.xml in the catalogs/lakehouse-iceberg/conf directory to automatically load as the default HDFS configuration.

caution

When writing to HDFS, the Gravitino Iceberg REST server can only operate as the specified HDFS user and doesn't support proxying to other HDFS users. See How to access Apache Hadoop for more details.