Skip to main content
Version: 0.3.1

Iceberg REST catalog service

Background

The Gravitino Iceberg REST Server follows the Apache Iceberg REST API specification and acts as an Iceberg REST catalog server.

Capabilities

  • Supports the Apache Iceberg REST API defined in Iceberg 1.3.1, and supports all namespace and table interfaces. Token, ReportMetrics, and Config interfaces aren't supported yet.
  • Works as a catalog proxy, supporting HiveCatalog and JDBCCatalog.
  • When writing to HDFS, the Gravitino Iceberg REST catalog service can only operate as the specified HDFS user and doesn't support proxying to other HDFS users. See How to access Apache Hadoop for more details.
info

Builds with Apache Iceberg 1.3.1. The Apache Iceberg table format version is 1 by default. Builds with Hadoop 2.10.x, there may be compatibility issues when accessing Hadoop 3.x clusters.

How to start the Gravitino Iceberg REST catalog service

Deploy the Gravitino server to the GRAVITINO_HOME directory. You can find the configuration options in $GRAVITINO_HOME/conf/gravitino.conf.

Gravitino Iceberg REST catalog service configuration

Configuration itemDescriptionDefault valueRequiredSince Version
gravitino.auxService.namesThe auxiliary service name of the Gravitino Iceberg REST catalog service, use iceberg-rest for the Gravitino Iceberg REST catalog service.(none)Yes0.2.0
gravitino.auxService.iceberg-rest.classpathThe classpath of the Gravitino Iceberg REST catalog service, includes the directory containing jars and configuration. It supports both absolute paths and relative paths, for example, catalogs/lakehouse-iceberg/libs, catalogs/lakehouse-iceberg/conf(none)Yes0.2.0
gravitino.auxService.iceberg-rest.hostThe host of the Gravitino Iceberg REST catalog service.0.0.0.0No0.2.0
gravitino.auxService.iceberg-rest.httpPortThe port of the Gravitino Iceberg REST catalog service.8090Yes0.2.0
gravitino.auxService.iceberg-rest.minThreadsThe minimum number of threads in the thread pool used by the Jetty web server. minThreads is 8 if the value is less than 8.Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 8)No0.2.0
gravitino.auxService.iceberg-rest.maxThreadsThe maximum number of threads in the thread pool used by the Jetty web server. maxThreads is 8 if the value is less than 8, and maxThreads must be greater than or equal to minThreads.Math.max(Runtime.getRuntime().availableProcessors() * 4, 400)No0.2.0
gravitino.auxService.iceberg-rest.threadPoolWorkQueueSizeThe size of the queue in the thread pool used by Gravitino Iceberg REST catalog service.100No0.2.0
gravitino.auxService.iceberg-rest.stopTimeoutThe amount of time in ms for the Gravitino Iceberg REST catalog service to stop gracefully. For more information see org.eclipse.jetty.server.Server#setStopTimeout.30000No0.2.0
gravitino.auxService.iceberg-rest.idleTimeoutThe timeout in ms of idle connections.30000No0.2.0
gravitino.auxService.iceberg-rest.requestHeaderSizeThe maximum size of an HTTP request.131072No0.2.0
gravitino.auxService.iceberg-rest.responseHeaderSizeThe maximum size of an HTTP response.131072No0.2.0
caution

You must set gravitino.auxService.iceberg-rest.httpPort explicitly, like 9001.

Iceberg catalog configuration

info

The Gravitino Iceberg REST catalog service uses the memory catalog by default. You can specify Hive or JDBC catalog for production environments.

Hive catalog configuration

Configuration itemDescriptionDefault valueRequiredSince Version
gravitino.auxService.iceberg-rest.catalog-backendThe Catalog backend of Gravitino Iceberg REST catalog service, use the value hive for a Hive catalog.memoryYes0.2.0
gravitino.auxService.iceberg-rest.uriThe Hive metadata address, such as thrift://127.0.0.1:9083.(none)Yes0.2.0
gravitino.auxService.iceberg-rest.warehouse The warehouse directory of the Hive catalog, such as /user/hive/warehouse-hive/.(none)Yes0.2.0

Iceberg JDBC backend configuration

Configuration itemDescriptionDefault valueRequiredSince Version
gravitino.auxService.iceberg-rest.catalog-backendThe Catalog backend of Gravitino Iceberg REST catalog service, use the value jdbc for a JDBC catalog.memoryYes0.2.0
gravitino.auxService.iceberg-rest.uriThe JDBC connection address, such as jdbc:postgresql://127.0.0.1:5432 for Postgres, or jdbc:mysql://127.0.0.1:3306/ for mysql.(none)Yes0.2.0
gravitino.auxService.iceberg-rest.warehouse The warehouse directory of JDBC catalog, set HDFS prefix if using HDFS, such as hdfs://127.0.0.1:9000/user/hive/warehouse-jdbc(none)Yes0.2.0
gravitino.auxService.iceberg-rest.jdbc.userThe username of the JDBC connection.(none)Yes0.2.0
gravitino.auxService.iceberg-rest.jdbc.passwordThe password of the JDBC connection.(none)Yes0.2.0
gravitino.auxService.iceberg-rest.jdbc-initializeWhether to initialize the meta tables when creating the JDBC catalog.trueNo0.2.0
gravitino.auxService.iceberg-rest.jdbc-drivercom.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver for MySQL, org.postgresql.Driver for PostgreSQL.(none)Yes0.3.0
caution

You must download the corresponding JDBC driver to the catalogs/lakehouse-iceberg/libs directory.

info

gravitino.auxService.iceberg-rest.jdbc-driver isn't required unless Gravitino manages multi JDBC drivers.

Other Apache Iceberg catalog properties

You can add other properties defined in Iceberg table properties. The clients property for example:

Configuration itemDescriptionDefault valueRequired
gravitino.auxService.iceberg-rest.clientsThe client pool size of the catalog.2No
info

catalog-impl has no effect.

HDFS configuration

The Gravitino Iceberg REST catalog service adds the HDFS configuration files, core-site.xml and hdfs-site.xml from the directory defined by gravitino.auxService.iceberg-rest.classpath, for example, catalogs/lakehouse-iceberg/conf, to the classpath.

Starting the Gravitino Iceberg REST catalog service

Starting the Gravitino Iceberg REST catalog service:

./bin/gravitino.sh start

How to find out whether the Gravitino Iceberg REST catalog service has started:

curl  http://127.0.0.1:9001/iceberg/application.wadl

Exploring the Gravitino and Apache Iceberg REST catalog service with Apache Spark

Deploying Apache Spark with Apache Iceberg support

Follow the Spark Iceberg start guide to set up Apache Spark's and Apache Iceberg's environment. Please keep the Apache Spark version consistent with the spark-iceberg-runtime version.

Starting the Apache Spark client with the Apache Iceberg REST catalog

Configuration itemDescription
spark.sql.catalog.${catalog-name}.typeThe Spark catalog type, should set to rest.
spark.sql.catalog.${catalog-name}.uriSpark Iceberg REST catalog URI, such as http://127.0.0.1:9001/iceberg/.

For example:

./bin/spark-shell -v \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.rest.type=rest \
--conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/

Exploring Apache Iceberg with Apache Spark SQL

// First change to use the `rest` catalog
USE rest;
CREATE DATABASE IF NOT EXISTS dml;
CREATE TABLE dml.test (id bigint COMMENT 'unique id') using iceberg
DESCRIBE TABLE EXTENDED dml.test;
INSERT INTO dml.test VALUES (1), (2);
SELECT * FROM dml.test