Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5973] feat(hadoop-catalog): Support credential when using fileset catalog with cloud storage #5974

Merged
merged 75 commits into from
Jan 10, 2025

Conversation

yuqi1129
Copy link
Contributor

@yuqi1129 yuqi1129 commented Dec 24, 2024

What changes were proposed in this pull request?

Support dynamic credential in obtaining cloud storage fileset.

Why are the changes needed?

Static key are not very safe, we need to optimize it.

Fix: #5973

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

ITs

@yuqi1129 yuqi1129 marked this pull request as draft December 24, 2024 13:13
@yuqi1129
Copy link
Contributor Author

This PR depends on #5620, #5806 and #5971

@yuqi1129 yuqi1129 marked this pull request as ready for review December 27, 2024 11:51
@yuqi1129
Copy link
Contributor Author

@FANNG1 ,
Please help to solve the problem about token permission for OSS and S3, besides, GCS token for Java client seems to have some problems, please see: #6028

@yuqi1129 yuqi1129 self-assigned this Dec 27, 2024
@yuqi1129 yuqi1129 requested a review from FANNG1 December 28, 2024 01:19
@FANNG1
Copy link
Contributor

FANNG1 commented Dec 29, 2024

Do you plan to support the static credential like S3SecretKey and some storage properties not included in the credential like s3-region in the new PR?

@yuqi1129
Copy link
Contributor Author

@FANNG1 , Please help to solve the problem about token permission for OSS and S3, besides, GCS token for Java client seems to have some problems, please see: #6028

solved

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 30, 2024

Do you plan to support the static credential like S3SecretKey and some storage properties not included in the credential like s3-region in the new PR?

The current PR also support static credential as s3 endpoint(can replace s3-region) is a required parameter

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Jan 9, 2025

There're two options, I prefer Option2. @yuqi1129 @jerryshao WDYT?

Option1 using a helper class to translate the configuration.

      totalProperty.putAll(getCredentialConfigs(fileset))
      return provider.getFileSystem(filePath, totalProperty);


  Map<String, String> getCredentialConfigs(Fileset fileset) {
    Credential[] credentials = fileset.supportsCredentials().getCredentials();
    if (credentials.length == 0) {
      return ImmutableMap.of();
    }
    
    Map<String, String> maps = Maps.newHashMap();
    // common configurations for credential vending
    maps.put(xx, xx);
    
    // specific configurations
    Arrays.stream(credentials).forEach(
        credential -> {
          if (credential instanceof ADLSTokenCredential) {
            // add azure token credential configurations
          } else if (credential instanceof AzureAccountCredential) {
            // add azure account configurations
          } 
        }
    );   
    return maps;
  }

Option2 add an new SupportsCredentialVending interface to translate the configuration.

 if (provider instanceOf SupportsCredentialVending) {
     Credential[] credentials =  fileset.getCredentials()   
     totalProperty.putAll(provider.getCredentialConfig(credentials))
 }

 return provider.getFileSystem(filePath, totalProperty);


interface SupportsCredentialVending {
    default Map<String, String>  getCredentialConfig(Credential[] credentials) {
        // common logic
    }
} 

AzureFileSystemProvider implements SupportsCredentials {
    Map<String, String>  getCredentialConfig(Credential[] credentials) {
        // specific logic
    }
}

For option one, we need to determine the detailed credentials types in GVFS, which should be free from specific FileSystem implementations. Cloud storage related credentail should be handle in their own module.

If I'm not mistaken, for the latter.

  1. We should use an interface named SupportsCredentialVending align with FileSystemProvider to create detailed File system provider implementation.
  2. Compared to option 1, the structure is relatively clear. The biggest problem is changing all file system providers.

If I had to pick one, I would prefer the latter, but I doubt the ROI of this change.

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 9, 2025

Is changing all filesystem provider a big problems? I think it's necessary and clear.

For S3FileSystemProvider

New implement

    Map<String, String>  getCredentialConfig(Credential[] credentials) {
         Credential credential = getS3Credential(credentials);
         if (credential instance S3TokenCredential || credential instanceof S3SecretKeyCredential) {
             return ImmutableMap.of(Constants.AWS_CREDENTIALS_PROVIDER, S3CredentialsProvider.class.getCanonicalName())
        }
   }

Current implement, this seems bugy, you only checked whether the credentials array is empty not checked if there are s3 credentials.

    if (enableGravitinoCredentialVending(config)) {
      configuration.set(
          Constants.AWS_CREDENTIALS_PROVIDER, S3CredentialsProvider.class.getCanonicalName());
    }

For AzureFileSystemProvider

New:

    Map<String, String>  getCredentialConfig(Credential[] credentials) {
         Credential credential = getAzureCredential(credentials);
         if (credential instanceof ADLSTokenCredential) {
             // return specific map
        } else if (credential instance of AzureAccountCredential) {
             // return specific map
        }
   }

Current:

    if (enableGravitinoCredentialVending(hadoopConfMap)) {
      try {
        AzureSasCredentialsProvider azureSasCredentialsProvider = new AzureSasCredentialsProvider();
        azureSasCredentialsProvider.initialize(configuration, null);
        String sas = azureSasCredentialsProvider.getSASToken(null, null, null, null); 
        if (sas != null) {
          String accountName =
              String.format(
                  "%s.dfs.core.windows.net",
                  config.get(AzureProperties.GRAVITINO_AZURE_STORAGE_ACCOUNT_NAME));

          configuration.set(
              FS_AZURE_ACCOUNT_AUTH_TYPE_PROPERTY_NAME + "." + accountName, AuthType.SAS.name());
          configuration.set(
              FS_AZURE_SAS_TOKEN_PROVIDER_TYPE + "." + accountName,
              AzureSasCredentialsProvider.class.getName());
          configuration.set(FS_AZURE_ACCOUNT_IS_HNS_ENABLED, "true");
        } else if (azureSasCredentialsProvider.getAzureStorageAccountKey() != null
            && azureSasCredentialsProvider.getAzureStorageAccountName() != null) {
          configuration.set(
              String.format(
                  "fs.azure.account.key.%s.dfs.core.windows.net",
                  azureSasCredentialsProvider.getAzureStorageAccountName()),
              azureSasCredentialsProvider.getAzureStorageAccountKey());
        }

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Jan 9, 2025

AzureSasCredentialsProvider azureSasCredentialsProvider = new AzureSasCredentialsProvider();
azureSasCredentialsProvider.initialize(configuration, null);
String sas = azureSasCredentialsProvider.getSASToken(null, null, null, null);

  1. SupportsCredentialVending

AzureSasCredentialsProvider azureSasCredentialsProvider = new AzureSasCredentialsProvider();
azureSasCredentialsProvider.initialize(configuration, null);
String sas = azureSasCredentialsProvider.getSASToken(null, null, null, null);

cc @jerryshao

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 10, 2025

LGTM

@FANNG1 FANNG1 merged commit 78447ce into apache:main Jan 10, 2025
28 checks passed
Abyss-lord pushed a commit to Abyss-lord/gravitino that referenced this pull request Jan 10, 2025
…eset catalog with cloud storage (apache#5974)

### What changes were proposed in this pull request?

Support dynamic credential in obtaining cloud storage fileset.

### Why are the changes needed?

Static key are not very safe, we need to optimize it. 

Fix: apache#5973 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

ITs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support using credential when using fileset with cloud storage in Java GVFS
3 participants