Apache Hive catalog
Introduction
Gravitino offers the capability to utilize Apache Hive as a catalog for metadata management.
Requirements and limitations
- The Hive catalog requires a Hive Metastore Service (HMS), or a compatible implementation of the HMS, such as AWS Glue.
- Gravitino must have network access to the Hive metastore service using the Thrift protocol.
The Hive catalog is available for Apache Hive 2.x only. Support for Apache Hive 3.x is under development.
Catalog
Catalog capabilities
The Hive catalog supports creating, updating, and deleting databases and tables in the HMS.
Catalog properties
Property Name | Description | Default Value | Required | Since Version |
---|---|---|---|---|
metastore.uris | The Hive metastore service URIs, separate multiple addresses with commas. Such as thrift://127.0.0.1:9083 | (none) | Yes | 0.2.0 |
client.pool-size | The maximum number of Hive metastore clients in the pool for Gravitino. | 1 | No | 0.2.0 |
gravitino.bypass. | Property name with this prefix passed down to the underlying HMS client for use. Such as gravitino.bypass.hive.metastore.failure.retries = 3 indicate 3 times of retries upon failure of Thrift metastore calls | (none) | No | 0.2.0 |
client.pool-cache.eviction-interval-ms | The cache pool eviction interval. | 300000 | No | 0.4.0 |
impersonation-enable | Enable user impersonation for Hive catalog. | false | No | 0.4.0 |
kerberos.principal | The Kerberos principal for the catalog. You should configure gravitino.bypass.hadoop.security.authentication , gravitino.bypass.hive.metastore.kerberos.principal and gravitino.bypass.hive.metastore.sasl.enabled if you want to use Kerberos. | (none) | required if you use kerberos | 0.4.0 |
kerberos.keytab-uri | The uri of key tab for the catalog. Now supported protocols are https , http , ftp , file . | (none) | required if you use kerberos | 0.4.0 |
kerberos.check-interval-sec | The interval to check validness of the principal | 60 | No | 0.4.0 |
kerberos.keytab-fetch-timeout-sec | The timeout to fetch key tab | 60 | No | 0.4.0 |
When you use the Gravitino with Trino. You can pass the Trino Hive connector configuration using prefix trino.bypass.
. For example, using trino.bypass.hive.config.resources
to pass the hive.config.resources
to the Gravitino Hive catalog in Trino runtime.
When you use the Gravitino with Spark. You can pass the Spark Hive connector configuration using prefix spark.bypass.
. For example, using spark.bypass.hive.exec.dynamic.partition.mode
to pass the hive.exec.dynamic.partition.mode
to the Spark Hive connector in Spark runtime.
Catalog operations
Refer to Manage Relational Metadata Using Gravitino for more details.
Schema
Schema capabilities
The Hive catalog supports creating, updating, and deleting databases in the HMS.
Schema properties
Schema properties supply or set metadata for the underlying Hive database. The following table lists predefined schema properties for the Hive database. Additionally, you can define your own key-value pair properties and transmit them to the underlying Hive database.
Property name | Description | Default value | Required | Since Version |
---|---|---|---|---|
location | The directory for Hive database storage, such as /user/hive/warehouse . | HMS uses the value of hive.metastore.warehouse.dir in the hive-site.xml by default. | No | 0.1.0 |
Schema operations
see Manage Relational Metadata Using Gravitino.