Skip to main content
Version: 0.3.0

How to use the playground

Playground introduction

Playground is a complete Gravitino Docker runtime environment with Hive, Hdfs, Trino, MySQL, PostgreSQL, and Gravitino server.

Depending on your network, the startup may take 3-5 minutes.

Once the playground environment has started, you can open http://localhost:8090 to access the Gravitino Web UI.

Prerequisite

You should install git and docker-compose.

Start playground

git clone git@github.com:datastrato/gravitino-playground.git
cd gravitino-playground
./launch-playground.sh

Experience Gravitino with Trino SQL

  1. Login to Gravitino playground Trino Docker container using the following command.
docker exec -it playground-trino bash
  1. Open Trino CLI in the container.
trino@d2bbfccc7432:/$ trino

Example

Simple queries

Use simple queries to test in the Trino CLI.

SHOW CATALOGS;

CREATE SCHEMA "metalake_demo.catalog_hive".db1
WITH (location = 'hdfs://hive:9000/user/hive/warehouse/db1.db');

SHOW CREATE SCHEMA "metalake_demo.catalog_hive".db1;

CREATE TABLE "metalake_demo.catalog_hive".db1.table_001
(
name varchar,
salary varchar
)
WITH (
format = 'TEXTFILE'
);

INSERT INTO "metalake_demo.catalog_hive".db1.table_001 (name, salary) VALUES ('sam', '11');

SELECT * FROM "metalake_demo.catalog_hive".db1.table_001;

SHOW SCHEMAS from "metalake_demo.catalog_hive";

DESCRIBE "metalake_demo.catalog_hive".db1.table_001;

SHOW TABLES from "metalake_demo.catalog_hive".db1;

Cross-catalog queries

In companies, there may be different departments using different data stacks. In this example, HR department uses Apache Hive to store its data. Sales department uses PostgreSQL to store its data. This example has generated some data for two departments. You can query some interesting results with Gravitino.

If you want to know which employee has the largest sales amount. You can run the SQL.

WITH totalsales AS (
SELECT
employee_id,
SUM(total_amount) AS sales_amount
FROM "metalake_demo.catalog_hive".sales.sales
GROUP BY
employee_id
), rankedemployees AS (
SELECT
employee_id,
sales_amount,
RANK() OVER (ORDER BY sales_amount DESC) AS sales_rank
FROM totalsales
)
SELECT
e.employee_id,
given_name,
family_name,
job_title,
sales_amount
FROM rankedemployees AS r
JOIN "metalake_demo.catalog_postgres".hr.employees AS e
ON r.employee_id = e.employee_id
WHERE
sales_rank = 1;

If you want to know top 10 customers who bought the most by state. You run the SQL.

WITH customersales AS (
SELECT
"metalake_demo.catalog_hive".sales.customers.customer_id,
customer_name,
customer_email,
location AS state,
SUM(total_amount) AS total_spent
FROM "metalake_demo.catalog_hive".sales.sales
JOIN "metalake_demo.catalog_hive".sales.customers
ON "metalake_demo.catalog_hive".sales.sales.customer_id = "metalake_demo.catalog_hive".sales.customers.customer_id
JOIN "metalake_demo.catalog_hive".sales.stores
ON "metalake_demo.catalog_hive".sales.sales.store_id = "metalake_demo.catalog_hive".sales.stores.store_id
GROUP BY
"metalake_demo.catalog_hive".sales.customers.customer_id,
customer_name,
customer_email,
location
), rankedcustomersales AS (
SELECT
customer_id,
customer_name,
customer_email,
state,
total_spent,
RANK() OVER (PARTITION BY state ORDER BY total_spent DESC) AS customer_rank
FROM customersales
)
SELECT
customer_id,
customer_name,
customer_email,
state,
total_spent
FROM rankedcustomersales
WHERE
customer_rank <= 10
ORDER BY
state,
customer_rank;

If you want to know that employees average performance rating and total sales. You run the SQL.

set session allow_pushdown_into_connectors=false;
WITH employeeperformance AS (
SELECT
employee_id,
AVG(rating) AS average_rating
FROM "metalake_demo.catalog_postgres".hr.employee_performance
GROUP BY
employee_id
), employeesales AS (
SELECT
employee_id,
SUM(total_amount) AS total_sales
FROM "metalake_demo.catalog_hive".sales.sales
GROUP BY
employee_id
)
SELECT
e.employee_id,
average_rating,
total_sales
FROM employeeperformance AS e
JOIN employeesales AS s
ON e.employee_id = s.employee_id;