Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6366] improvement(hadoop, authorization): support getting location of schema for gcs hadoop catalog. #6367

Closed
wants to merge 3 commits into from

Conversation

yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

support getting location of schema for gcs hadoop catalog.

Why are the changes needed?

it's an improvement.

Fix: #6366

Does this PR introduce any user-facing change?

N/A.

How was this patch tested?

UT

@yuqi1129 yuqi1129 requested review from xunliu and jerqi and removed request for xunliu January 24, 2025 13:09
@yuqi1129 yuqi1129 self-assigned this Jan 24, 2025
@@ -445,7 +447,7 @@ public static List<String> getMetadataObjectLocation(
// The Hive default schema location is Hive warehouse directory
String defaultSchemaLocation =
getHiveDefaultLocation(ident.name(), catalogObj.name());
if (defaultSchemaLocation != null && !defaultSchemaLocation.isEmpty()) {
if (StringUtils.isNotBlank(defaultSchemaLocation)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@@ -462,6 +464,13 @@ public static List<String> getMetadataObjectLocation(
if (defaultSchemaLocation != null && !defaultSchemaLocation.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment on lines +486 to +487
if (Objects.equals(catalogProvider, "hive")
|| Objects.equals(catalogProvider, "hadoop")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just do string comparison?

Copy link
Contributor Author

@yuqi1129 yuqi1129 Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to get NPE for a new starter like stringVariable.equals("value") if stringVariable is a null value.

jerqi
jerqi previously approved these changes Jan 26, 2025
locations.add(defaultSchemaLocation);
}
} else if (catalogObj.provider().equals("hadoop")) {
String catalogLocation = catalogObj.properties().get(HiveConstants.LOCATION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised that our core module relies on catalog-common module, which is not what I want, catalog-common as its name, should always be used by catalog-* modules and other modules rely on catalog. But for core module, the dependency relation is the opposite, we should fix this.

I saw it is introduced by @FANNG1 , so we really should fix it.

Besides, why a hadoop catalog needs to get a Hive property definition? It's weird to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, why a hadoop catalog needs to get a Hive property definition? It's weird to me.

I just want to reuse the string constant location, indeed it's not so elegant to use constants in HiveConstants, I will introduce a similar value in the HadoopConstants

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a common constant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name for catalog-common is a little confusing, by design, it contains the constants shared by catalogs and connectors and Iceberg REST server, etc.

Copy link
Contributor

@jerryshao jerryshao Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, these constants should be moved to common module. The misuse of the modules makes the dependency really not a tree any more, it's more like a graph, which is really not good. We shoud bear in mind to avoid such cyclic dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, for catalog/connector, the dependency of catalog-common is reasonable, because it is the downstream of the catalog-common module. But core module is the upstream of the catalog-common model, so it is not a good implementation.

@@ -472,19 +480,20 @@ public static List<String> getMetadataObjectLocation(
.catalogDispatcher()
.loadCatalog(
NameIdentifier.of(ident.namespace().level(0), ident.namespace().level(1)));
LOG.info("Catalog provider is %s", catalogObj.provider());
if (catalogObj.provider().equals("hive")) {
LOG.info("Catalog provider is {}", catalogObj.provider());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LOG is useless, it is only for debug, better to remove it.

@jerqi
Copy link
Contributor

jerqi commented Jan 26, 2025

Is this PR duplicated with #6211 ?

@jerqi jerqi dismissed their stale review January 26, 2025 09:45

Add other changes.

@yuqi1129
Copy link
Contributor Author

Is this PR duplicated with #6211 ?

Indeed, I checked and it duplicated with #6211, Let's wait for it to merge and test authorization logic for GCS fileset.

@yuqi1129
Copy link
Contributor Author

Close it temporarily as it's duplicated with #6211

@yuqi1129 yuqi1129 closed this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Optimize logic getMetadataObjectLocation to support gcs fileset
5 participants