Skip to main content
Version: 0.5.0

Spark connector hive catalog

With the Gravitino Spark connector, accessing data or managing metadata in Hive catalogs becomes straightforward, enabling seamless federation queries across different Hive catalogs.

Capabilities

Supports most DDL and DML operations in SparkSQL, except such operations:

  • Function operations
  • Partition operations
  • View operations
  • Querying UDF
  • LOAD clause
  • CREATE TABLE LIKE clause
  • TRUCATE TABLE clause

Requirement

  • Hive metastore 2.x
  • HDFS 2.x or 3.x

SQL example


// Suppose hive_a is the Hive catalog name managed by Gravitino
USE hive_a;

CREATE DATABASE IF NOT EXISTS mydatabase;
USE mydatabase;

// Create table
CREATE TABLE IF NOT EXISTS employees (
id INT,
name STRING,
age INT
)
PARTITIONED BY (department STRING)
STORED AS PARQUET;
DESC TABLE EXTENDED employees;

INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES (1, 'John Doe', 30), (2, 'Jane Smith', 28);
INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3, 'Mike Brown', 32);

SELECT * FROM employees WHERE department = 'Engineering';

Catalog properties

Gravitino spark connector will transform below property names which are defined in catalog properties to Spark Hive connector configuration.

Property name in Gravitino catalog propertiesSpark Hive connector configurationDescriptionSince Version
metastore.urishive.metastore.urisHive metastore uri address0.5.0

Gravitino catalog property names with the prefix spark.bypass. are passed to Spark Hive connector. For example, using spark.bypass.hive.exec.dynamic.partition.mode to pass the hive.exec.dynamic.partition.mode to the Spark Hive connector.

caution

When using the spark-sql shell client, you must explicitly set the spark.bypass.spark.sql.hive.metastore.jars in the Gravitino Hive catalog properties. Replace the default builtin value with the appropriate setting for your setup.