Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globbing on dfs (abfss) endpoint for Azure Data Lake Storage Gen2 does not work properly #80

Open
keen85 opened this issue Sep 20, 2024 · 1 comment

Comments

@keen85
Copy link

keen85 commented Sep 20, 2024

I noticed some unexpected behavior when using globbing on an ADLSGen2 (Azure Data Lake Storage Gen2; Azure Storage Account).
ADLSGen2 supports blob and dfs endpoint.
According to documentation, globbing should be supported for both.
However, I noticed some queries that work for blob but not for the dfs endpoint:

endpoint executable? statement
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/foo')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/*.json')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/*')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/HOUR=9/**')
blob (az) OK SELECT * FROM glob('az://some_folder/DAY=2024-09-20/**')
dfs (abfss) OK SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/foo.json')
dfs (abfss) ERROR SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/*.json')
dfs (abfss) ERROR SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/*')
dfs (abfss) ERROR SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/HOUR=9/**')
dfs (abfss) OK SELECT * FROM glob('abfss://some_folder/DAY=2024-09-20/**')

ERROR: SQL Error: java.sql.SQLException: Invalid Error: 404 The specified path does not exist. The specified path does not exist.

@olsgaard
Copy link

I'm having similar issues:

After starting duckdb from the commandline and setting up secrets, I get the following errors when using abfss, but not when using az protocol:

D SELECT * FROM 'abfss://<container>/orders/**/*.parquet' limit 10;
Not implemented Error: abfss do not manage recursive lookup patterns, daily-vessel-consumption/**/*.parquet is therefor illegal, only pattern ending by ** are allowed.

But if I try that, I get:

SELECT * FROM 'abfss://<container>/orders/**' limit 10;
Catalog Error: Table with name abfss://<container>/orders/** does not exist!
Did you mean "pg_tablespace"?
LINE 1: SELECT * FROM 'abfss://<container>/orders/...

I should be running the latest release version and am running on ubuntu linux.

$ duckdb --version
v1.1.2 f680b7d08f
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04 LTS
Release:        20.04
Codename:       focal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants