Introducing Gravitino 0.4.0

· 6 min read

Today, we are pleased to announce the release of Gravitino 0.4.0. This version is a stable release, which includes more than 280 bug fixes as well as a bunch of new features.

In this blog post, we will walk you through the highlights of Gravitino 0.4.0, giving you a quick overview of features and enhancements. To learn more about the nitty-gritty details, we recommend going through the comprehensive Gravitino 0.4.0 release notes, which include a full list of major features and resolved issues across all Gravitino components.

Public preview of Gravitino web UI

With the release of Gravitino 0.4.0, we are excited to announce the public preview of Gravitino’s web UI. This greatly improves the user experience of Gravitino.

Gravitino web UI supports the creation, updating, and deletion of metadata such as metalakes and catalogs. Additionally, it can list and display schemas, tables, columns, and their detailed information. You can access the UI by visiting the URL http://{gravitino-host}:8090 in your browser.

Here is the screenshot of Gravitino’s UI, you can manage metalakes in the UI as shown below:

metalakes

Within each metalake, you can also manage catalogs, the UI will list all the catalogs in a tree structure with schemas and tables under them.

catalogs

tables

Unified support of partition management

One of the new features added in Gravitino 0.4.0 is to support unified partition management for tables. With Gravitino, you can create, list, get, and delete partitions from different sources via the REST API and Java API in a unified way.

Gravitino provides a generic representation of partition definition. It can support Identity Partition (Hive’s partition definition), as well as List Partition and Range Partition supported by other engines.

Here is a brief example of how to use Gravitino to manage partitions.

Shell
Java

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "partitions": [
    {
      "type": "identity",
      "fieldNames": [
        [
          "dt"
        ],
        [
          "country"
        ]
      ],
      "values": [
        {
          "type": "literal",
          "dataType": "date",
          "value": "2008-08-08"
        },
        {
          "type": "literal",
          "dataType": "string",
          "value": "us"
        }
      ]
    }
  ]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions

GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://localhost:8090")
    .build();

// Assumes that you have a partitioned table named "metalake.catalog.schema.table".
Partition addedPartition =
    gravitinoClient
        .loadMetalake(NameIdentifier.of("metalake"))
        .loadCatalog(NameIdentifier.of("metalake", "catalog"))
        .asTableCatalog()
        .loadTable(NameIdentifier.of("metalake", "catalog", "schema", "table"))
        .supportPartitions()
        .addPartition(
            Partitions.identity(
              new String[][] {{"dt"}, {"country"}},
              new Literal[] {
              Literals.dateLiteral(LocalDate.parse("2008-08-08")), Literals.stringLiteral("us")},
              Maps.newHashMap()));

For more details, you can refer to Gravitino’s partition management documentation.

Support column default values, auto increment, and table indexes

As a unified metadata lake, Gravitino’s goal is to provide a unified representation of different metadata. In version 0.4.0, we have included supporting default values and auto increment in column definitions, as well as indexes in tables.

Users can now create tables with default values and auto increment specified in column definitions, indexes specified in table definitions.

Here’s also a brief example of how to use them in table creation:

Shell
Java

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "table",
  "columns": [
    {
      "name": "id",
      "type": "integer",
      "nullable": true,
      "autoIncrement": true,
      "comment": "Id of the user"
    },
    {
      "name": "name",
      "type": "varchar(1000)",
      "nullable": true,
      "comment": "Name of the user"
    },
    {
      "name": "age",
      "type": "integer",
      "nullable": false,
      "comment": "Age of the user"
      "defaultValue": {
        "type": "literal",
        "dataType": "integer",
        "value": "-1"
      }
    },
    {
      "name": "score",
      "type": "double",
      "nullable": true,
      "comment": "Score of the user"
    }
  ],
  "comment": "A user table with detailed information",
  "indexes": [
    {
      "indexType": "PRIMARY_KEY",
      "name": "PRIMARY",
      "fieldNames": [["id"]]
    },
    {
      "indexType": "UNIQUE_KEY",
      "name": "name_age_score_uk",
      "fieldNames": [["name"],["age"],["score]]
    }
  ]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables

tableCatalog.createTable(
    NameIdentifier.of("metalake", "hive_catalog", "schema", "table"),
    new Column[] {
      Column.of("id", Types.IntegerType.get(), "Id of the user", false, true, null),
      Column.of("name", Types.VarCharType.of(1000), "Name of the user", true, false, null),
      Column.of("age", Types.IntergerType.get(), "Age of the user", false, false, Literals.integerLiteral(-1)),
      Column.of("score", Types.DoubleType.get(), "Score of the user", true, false, null)
    },
    "A user table with detailed information",
    tablePropertiesMap,
    Transforms.EMPTY_TRANSFORM,
    Distributions.NONE,
    new SortOrder[0],
    new Index[] {
      Indexes.of(IndexType.PRIMARY_KEY, "PRIMARY", new String[][]{{"id"}}),
      Indexes.of(IndexType.UNIQUE_KEY, "name_age_score_uk", new String[][]{{"name"}, {"age"}, {"score"}})
    });

For more details, you can refer to Gravitino’s documentation on how to specify default values auto increment, and indexes.

Security enhancements

Security is always our top priority at Gravitino. In version 0.3.0, we have implemented OAuth2 authentication support, and in this release, we add additional security features.

Kerberos is widely supported in the big data field. In response to user demand, we have implemented Kerberos authentication support for client and server communication. With SPNEGO enabled, users can use Kerberos authenticated headers to communicate with the server.

This version also includes the support of user impersonation, ensuring that each request uses a real user to communicate with the underlying sources. For the Hive catalog, we have added Kerberos support to communicate with the Hive MetaStore, by simply configuring the principal and keytab, Gravitino can now communicate with HMS using Kerberos authentication.

For more details of how to enable security-related features,see the documentation security.

Support completed operator pushdown for Trino connector

In version 0.4.0, we implemented the completed operator pushdown for the Trino connector, which has improved performance. Additionally, we conducted a TPC-H benchmark test, here is the performance comparison between the two versions (lower is better):

TPC-H benchmark

As you can see, for most of the TPC-H queries, the Gravitino 0.4.0 Trino connector gives better results, when compared to the previous version. It gains at most 38% performance boost and on average 7% better performance.

Java 8, 11, and 17 support

Gravitino can now run on a variety of Java environments, including Java 8, 11, and 17. This enhancement offers increased flexibility and compatibility for users.

More than just features in Gravitino 0.4.0

While the spotlight often falls on the new features, the true importance of the project is its focus on usability, stability, and incremental improvements. To that end, Gravitino 0.4.0 has tackled and resolved over 280 issues, thanks to the collaborative efforts of all the contributors. To learn more, read the release notes for the full list of improvements.

Get started with Gravitino 0.4.0 today

If you want to experiment with Gravitino 0.4.0, you can simply launch the provided docker playground, see the playground documentation. If you want to install and run from the ground up, also see the documentation on how to install Gravitino.

Please let us know if you have any questions, you can contact us via our Github repository, or join our Discourse community and Slack group.

Public preview of Gravitino web UI​

Unified support of partition management​

Support column default values, auto increment, and table indexes​

Security enhancements​

Support completed operator pushdown for Trino connector​

Java 8, 11, and 17 support​

More than just features in Gravitino 0.4.0​

Get started with Gravitino 0.4.0 today​