Iceberg REST catalog service
Background
The Gravitino Iceberg REST Server follows the Apache Iceberg REST API specification and acts as an Iceberg REST catalog server.
Capabilities
- Supports the Apache Iceberg REST API defined in Iceberg 1.3.1, and supports all namespace and table interfaces.
Token
, andConfig
interfaces aren't supported yet. - Works as a catalog proxy, supporting
Hive
andJDBC
as catalog backend. - Provides a pluggable metrics store interface to store and delete Iceberg metrics.
- When writing to HDFS, the Gravitino Iceberg REST catalog service can only operate as the specified HDFS user and doesn't support proxying to other HDFS users. See How to access Apache Hadoop for more details.
Builds with Apache Iceberg 1.3.1
. The Apache Iceberg table format version is 1
by default.
Builds with Hadoop 2.10.x. There may be compatibility issues when accessing Hadoop 3.x clusters.
Gravitino Iceberg REST catalog service configuration
Assuming the Gravitino server is deployed in the GRAVITINO_HOME
directory, you can locate the configuration options in $GRAVITINO_HOME/conf/gravitino.conf
. There are four configuration properties for the Iceberg REST catalog service:
-
REST Catalog Server Configuration: you can specify the HTTP server properties like host and port.
-
Gravitino Iceberg metrics store Configuration: you could implement a custom Iceberg metrics store and set corresponding configuration.
-
Gravitino Iceberg Catalog backend Configuration: you have the option to set the specified catalog-backend to either
jdbc
orhive
. -
Other Iceberg Catalog Properties Defined by Apache Iceberg: allows you to configure additional properties defined by Apache Iceberg.
Please refer to the following sections for details.
REST catalog server configuration
Configuration item | Description | Default value | Required | Since Version |
---|---|---|---|---|
gravitino.auxService.names | The auxiliary service name of the Gravitino Iceberg REST catalog service. Use iceberg-rest . | (none) | Yes | 0.2.0 |
gravitino.auxService.iceberg-rest.classpath | The classpath of the Gravitino Iceberg REST catalog service; includes the directory containing jars and configuration. It supports both absolute and relative paths, for example, catalogs/lakehouse-iceberg/libs, catalogs/lakehouse-iceberg/conf | (none) | Yes | 0.2.0 |
gravitino.auxService.iceberg-rest.host | The host of the Gravitino Iceberg REST catalog service. | 0.0.0.0 | No | 0.2.0 |
gravitino.auxService.iceberg-rest.httpPort | The port of the Gravitino Iceberg REST catalog service. | 9001 | No | 0.2.0 |
gravitino.auxService.iceberg-rest.minThreads | The minimum number of threads in the thread pool used by the Jetty web server. minThreads is 8 if the value is less than 8. | Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 8) | No | 0.2.0 |
gravitino.auxService.iceberg-rest.maxThreads | The maximum number of threads in the thread pool used by the Jetty web server. maxThreads is 8 if the value is less than 8, and maxThreads must be greater than or equal to minThreads . | Math.max(Runtime.getRuntime().availableProcessors() * 4, 400) | No | 0.2.0 |
gravitino.auxService.iceberg-rest.threadPoolWorkQueueSize | The size of the queue in the thread pool used by Gravitino Iceberg REST catalog service. | 100 | No | 0.2.0 |
gravitino.auxService.iceberg-rest.stopTimeout | The amount of time in ms for the Gravitino Iceberg REST catalog service to stop gracefully. For more information, see org.eclipse.jetty.server.Server#setStopTimeout . | 30000 | No | 0.2.0 |
gravitino.auxService.iceberg-rest.idleTimeout | The timeout in ms of idle connections. | 30000 | No | 0.2.0 |
gravitino.auxService.iceberg-rest.requestHeaderSize | The maximum size of an HTTP request. | 131072 | No | 0.2.0 |
gravitino.auxService.iceberg-rest.responseHeaderSize | The maximum size of an HTTP response. | 131072 | No | 0.2.0 |
gravitino.auxService.iceberg-rest.customFilters | Comma-separated list of filter class names to apply to the APIs. | (none) | No | 0.4.0 |
The filter in customFilters
should be a standard javax servlet filter.
You can also specify filter parameters by setting configuration entries in the style gravitino.auxService.iceberg-rest.<class name of filter>.param.<param name>=<value>
.
Iceberg metrics store configuration
Gravitino provides a pluggable metrics store interface to store and delete Iceberg metrics. You can develop a class that implements com.datastrato.gravitino.catalog.lakehouse.iceberg.web.metrics
and add the corresponding jar file to the Iceberg REST service classpath directory.
Configuration item | Description | Default value | Required | Since Version |
---|---|---|---|---|
gravitino.auxService.iceberg-rest.metricsStore | The Iceberg metrics storage class name. | (none) | No | 0.4.0 |
gravitino.auxService.iceberg-rest.metricsStoreRetainDays | The days to retain Iceberg metrics in store, the value not greater than 0 means retain forever. | -1 | No | 0.4.0 |
gravitino.auxService.iceberg-rest.metricsQueueCapacity | The size of queue to store metrics temporally before storing to the persistent storage. Metrics will be dropped when queue is full. | 1000 | No | 0.4.0 |
Gravitino Iceberg catalog backend configuration
The Gravitino Iceberg REST catalog service uses the memory catalog backend by default. You can specify a Hive or JDBC catalog backend for production environment.
Hive backend configuration
Configuration item | Description | Default value | Required | Since Version |
---|---|---|---|---|
gravitino.auxService.iceberg-rest.catalog-backend | The Catalog backend of the Gravitino Iceberg REST catalog service. Use the value hive for a Hive catalog. | memory | Yes | 0.2.0 |
gravitino.auxService.iceberg-rest.uri | The Hive metadata address, such as thrift://127.0.0.1:9083 . | (none) | Yes | 0.2.0 |
gravitino.auxService.iceberg-rest.warehouse | The warehouse directory of the Hive catalog, such as /user/hive/warehouse-hive/ . | (none) | Yes | 0.2.0 |