Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

presto connector (remote hive,hdfs) #995

Open
JooyoungJeong opened this issue Oct 18, 2019 · 3 comments
Open

presto connector (remote hive,hdfs) #995

JooyoungJeong opened this issue Oct 18, 2019 · 3 comments

Comments

@JooyoungJeong
Copy link

Hi.

I succeeded in connecting using hive connector of presto.
I have successfully connected to the hive on the remote using the hive connector.
I get a table using presto query.
But when I select the table it turns out that I can't find hdfs like this

MY kubernetes Cluster_A: Hive and HDFS [ metering v0.12 ]
MY kubernetes Cluster_B: Presto [ metering v.4.2 ]
connector.name=hive-hadoop2
hive.allow-drop-table=false
hive.allow-rename-table=false
hive.storage-format=ORC
hive.compression-codec=SNAPPY
hive.hdfs.authentication.type=NONE
hive.metastore.authentication.type=NONE
hive.collect-column-statistics-on-write=false
hive.metastore.uri=thrift://10.xxxxx:9083
SELECT * FROM d1_hive.default.datasource_mymy

Error running query: java.net.UnknownHostException: hdfs-namenode-proxy
I think hive-metastore is set to the url to get the table information, but hdfs-namenode-proxy is the service address in the cluster.
Is there a way to set the url of the hdfs-namenode-proxy?

thank you.

@chancez
Copy link
Contributor

chancez commented Oct 18, 2019

I'm not sure what your trying to do entirely, but Presto and Hive both need access to HDFS, and must access it the same way.

Because the URIs for where tables are located are stored within Hive metastore, it's unlikely you'll have much success using Hive and HDFS from another cluster since they're using internal cluster hostnames, and once the tables are created, you would need to change their locations to an HDFS hostname accessible to all instances of Presto/Hive.

We don't directly support HDFS, as it's only for development, and it has no security enabled in our installation, so I can't really recommend many alternatives. Making it public would be a bad choice.

What are you trying to accomplish here?

@JooyoungJeong
Copy link
Author

@chancez
Thank you for your feedback.

I am currently using v0.12 version on A cluster.
My purpose is to back up for datasources on Hive-HDFS.

So I installed v4.2 on cluster B, connected to hive on cluster A using presto, and backed up to s3 using cluster hive on cluster B.

Is there a way to effectively back up datasources from the meteirng(v0.12) connected to Hive-HDFS?

@chancez
Copy link
Contributor

chancez commented Oct 23, 2019

There's been so many changes between now and 0.12 it's hard to say. The underlying data schemas could have changed, among other things. I'm not really sure how much we can do to assist on that particular goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants