Iceberg catalog
Introduction
Gravitino provides the ability to manage Apache Iceberg metadata.
Requirements and limitations
Builds with Apache Iceberg 1.3.1. The Apache Iceberg table format version is 1 by default.
Builds with Hadoop 2.10.x, there may be compatibility issues when accessing Hadoop 3.x clusters.
Catalog
Catalog capabilities
- Works as a catalog proxy, supporting HiveCatalog,JdbcCatalogandRESTCatalog.
- Supports DDL operations for Iceberg schemas and tables.
- Doesn't support snapshot or table management operations.
Catalog properties
| Property name | Description | Default value | Required | Since Version | 
|---|---|---|---|---|
| catalog-backend | Catalog backend of Gravitino Iceberg catalog. Supports hiveorjdbcorrest. | (none) | Yes | 0.2.0 | 
| uri | The URI configuration of the Iceberg catalog. thrift://127.0.0.1:9083orjdbc:postgresql://127.0.0.1:5432/db_nameorjdbc:mysql://127.0.0.1:3306/metastore_dborhttp://127.0.0.1:9001. | (none) | Yes | 0.2.0 | 
| warehouse | Warehouse directory of catalog. file:///user/hive/warehouse-hive/for local fs orhdfs://namespace/hdfs/pathfor HDFS. | (none) | Yes | 0.2.0 | 
Any properties not defined by Gravitino with gravitino.bypass. prefix will pass to Iceberg catalog properties and HDFS configuration. For example, if specify gravitino.bypass.list-all-tables, list-all-tables will pass to Iceberg catalog properties.
When you use the Gravitino with Trino. You can pass the Trino Iceberg connector configuration using prefix trino.bypass.. For example, using trino.bypass.iceberg.table-statistics-enabled to pass the iceberg.table-statistics-enabled to the Gravitino Iceberg catalog in Trino runtime.
When you use the Gravitino with Spark. You can pass the Spark Iceberg connector configuration using prefix spark.bypass.. For example, using spark.bypass.io-impl to pass the io-impl to the Spark Iceberg connector in Spark runtime.
JDBC catalog
If you are using JDBC catalog, you must provide jdbc-user, jdbc-password and jdbc-driver to catalog properties.
| Property name | Description | Default value | Required | Since Version | 
|---|---|---|---|---|
| jdbc-user | JDBC user name | (none) | Yes | 0.2.0 | 
| jdbc-password | JDBC password | (none) | Yes | 0.2.0 | 
| jdbc-driver | com.mysql.jdbc.Driverorcom.mysql.cj.jdbc.Driverfor MySQL,org.postgresql.Driverfor PostgreSQL | (none) | Yes | 0.3.0 | 
| jdbc-initialize | Whether to initialize meta tables when create JDBC catalog | true | No | 0.2.0 | 
You must download the corresponding JDBC driver to the catalogs/lakehouse-iceberg/libs directory.
Catalog operations
Please refer to Manage Relational Metadata Using Gravitino for more details.
Schema
Schema capabilities
- doesn't support cascade drop schema.
Schema properties
You could put properties except comment.
Schema operations
Please refer to Manage Relational Metadata Using Gravitino for more details.
Table
Table capabilities
- Doesn't support column default value.
Table partitions
Supports transforms:
- IdentityTransform
- BucketTransform
- TruncateTransform
- YearTransform
- MonthTransform
- DayTransform
- HourTransform
Iceberg doesn't support multi fields in BucketTransform.
Iceberg doesn't support ApplyTransform, RangeTransform, and ListTransform.
Table sort orders
supports expressions:
- FieldReference
- FunctionExpression- bucket
- truncate
- year
- month
- day
- hour
 
For bucket and truncate, the first argument must be integer literal, and the second argument must be field reference.
Table distributions
- Gravitino used by default NoneDistribution.
- JSON
- Java
{
  "strategy": "none",
  "number": 0,
  "expressions": []
}
Distributions.NONE;
- Support HashDistribution, Hash distribute by partition key.
- JSON
- Java
{
  "strategy": "hash",
  "number": 0,
  "expressions": []
}
Distributions.HASH;
- Support RangeDistribution, You can passrangeas values through the API. Range distribute by partition key or sort key if table has an SortOrder.
- JSON
- Java
{
  "strategy": "range",
  "number": 0,
  "expressions": []
}
Distributions.RANGE;
Iceberg automatically distributes the data according to the partition or table sort order. It is forbidden to specify distribution expressions.
Apache Iceberg doesn't support Gravitino EvenDistribution type.
Table column types
| Gravitino Type | Apache Iceberg Type | 
|---|---|
| Sturct | Struct | 
| Map | Map | 
| Array | Array | 
| Boolean | Boolean | 
| Integer | Integer | 
| Long | Long | 
| Float | Float | 
| Double | Double | 
| String | String | 
| Date | Date | 
| Time | Time | 
| TimestampType withZone | TimestampType withZone | 
| TimestampType withoutZone | TimestampType withoutZone | 
| Decimal | Decimal | 
| Fixed | Fixed | 
| BinaryType | Binary | 
| UUID | UUID | 
Apache Iceberg doesn't support Gravitino Varchar Fixedchar Byte Short Union type.
Meanwhile, the data types other than listed above are mapped to Gravitino External Type that represents an unresolvable data type since 0.6.0.
Table properties
You can pass Iceberg table properties to Gravitino when creating an Iceberg table.
The Gravitino server doesn't allow passing the following reserved fields.
| Configuration item | Description | 
|---|---|
| comment | The table comment. | 
| creator | The table creator. | 
| location | Iceberg location for table storage. | 
| current-snapshot-id | The snapshot represents the current state of the table. | 
| cherry-pick-snapshot-id | Selecting a specific snapshot in a merge operation. | 
| sort-order | Selecting a specific snapshot in a merge operation. | 
| identifier-fields | The identifier fields for defining the table. | 
| write.distribution-mode | Defines distribution of write data | 
Table indexes
- Doesn't support table indexes.
Table operations
Please refer to Manage Relational Metadata Using Gravitino for more details.
Alter table operations
Supports operations:
- RenameTable
- SetProperty
- RemoveProperty
- UpdateComment
- AddColumn
- DeleteColumn
- RenameColumn
- UpdateColumnType
- UpdateColumnPosition
- UpdateColumnNullability
- UpdateColumnComment
The default column position is LAST when you add a column. If you add a non nullability column, there may be compatibility issues.
If you update a nullability column to non nullability, there may be compatibility issues.
HDFS configuration
You can place core-site.xml and hdfs-site.xml in the catalogs/lakehouse-iceberg/conf directory to automatically load as the default HDFS configuration.
When writing to HDFS, the Gravitino Iceberg REST server can only operate as the specified HDFS user and doesn't support proxying to other HDFS users. See How to access Apache Hadoop for more details.