Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Hive3 Metastore Multi Catalog Support #5912

Open
an-shi-chi-fan opened this issue Dec 19, 2024 · 9 comments
Open

[FEATURE] Hive3 Metastore Multi Catalog Support #5912

an-shi-chi-fan opened this issue Dec 19, 2024 · 9 comments
Labels
feature New feature or request

Comments

@an-shi-chi-fan
Copy link

Describe the feature

hive3 metastore catalog support

Motivation

hive3 metastore support multi catalog

Describe the solution

like spark config spark.sql.hive.metastore.version , let user choose which version to use

Additional context

No response

@an-shi-chi-fan an-shi-chi-fan added the feature New feature or request label Dec 19, 2024
@jerryshao
Copy link
Contributor

CC @mchades

@jerryshao
Copy link
Contributor

I think here the thing we should think is how to mapping the hive3 namespace to Gravitino's namespace. It is not like hive2 that uses db/table namespace, hive3 added another layer of catalog.

@yujiantao
Copy link

related issue: #5850

@an-shi-chi-fan
Copy link
Author

an-shi-chi-fan commented Dec 19, 2024

We have the following scenarios.

1. When Hive has the catalog, map the Gravitino catalog to Hive.
2. When Hive does not have the catalog, automatically create the Hive catalog and then map it to Gravitino.

In summary, Gravitino can create multiple catalogs, and these catalogs can point to different catalogs of the same Hive metastore.

The current version of the HMS client used by Gravitino is 2, and we need to add support for version 3 to cover our scenarios.

I have two ideas on how to proceed.

1. Add content for version 3 based on the current catalog-hive module.
2. Add a new module called catalog-hive3.

Do you have any suggestions?
@jerryshao @mchades

@jerryshao
Copy link
Contributor

Personally, I don't have a deep think about how to support S3 for now. Namespace mapping may just one problem, it also includes access control support (Ranger), and engine support (Spark, Trino), etc. I think we should have a deep investigation to gain a better understanding.

@an-shi-chi-fan
Copy link
Author

an-shi-chi-fan commented Dec 20, 2024

Hive 3 support draft

Regarding this feature, we had a brief discussion, and here are the results of the discussion.·

For catalog

image

Add the catalog-hive3 module in the catalogs module, and then use the IsolatedClassLoader from CatalogManager to isolate the classes of hive2 and hive3.
When creating a schema within the module, if it exists, map the catalog to Gravitino; if it does not exist, create a catalog.
We researched the hive3 client, which places the catalog parameter in the configuration, so we will map this configuration to the properties attribute when creating the catalog.This parameter will be used to configure metastore.catalog.default when creating the hive client.

For Engine

spark

just add spark config spark.bypass.metastore.catalog.default to path the config to hive3 catalog provider

trino

should add hive3 support for trino, but this is already an issue for trino support hive3 multi catalog detail

flink

Passing the catalog parameter to BaseCatalog may take effect, but I'm not sure about Flink.

For Ranger

We plan to support it in the future, but there is no conclusion yet.

@an-shi-chi-fan
Copy link
Author

@jerryshao @yujiantao CC

@mchades
Copy link
Contributor

mchades commented Dec 23, 2024

Hi @an-shi-chi-fan , thanks for your draft proposal about Hive3 support!

Could you kindly transfer the design to a Google document and provide the link for us to engage in discussions more conveniently, allowing for easier refinement of the design?

@an-shi-chi-fan
Copy link
Author

Hi @an-shi-chi-fan , thanks for your draft proposal about Hive3 support!

Could you kindly transfer the design to a Google document and provide the link for us to engage in discussions more conveniently, allowing for easier refinement of the design?

ok

click here to see it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants