Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(filesystem): UNC paths are not supported #1209

Merged
merged 27 commits into from
Apr 29, 2024
Merged
Changes from 2 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
8e111c1
fix(filesystem): UNC paths are not supported
IlyaFaer Apr 10, 2024
45d1d7b
small fix
IlyaFaer Apr 11, 2024
e39b8bc
use glob for UNC paths
IlyaFaer Apr 17, 2024
de82bb5
Merge branch 'devel' into windows_path_fix
IlyaFaer Apr 17, 2024
7c465bc
add the third slash
IlyaFaer Apr 17, 2024
6fee728
fix slahes
IlyaFaer Apr 17, 2024
f43a13a
fix if
IlyaFaer Apr 17, 2024
e813a3c
Merge branch 'devel' into windows_path_fix
rudolfix Apr 17, 2024
95edfaf
less invasive switch to Python glob
rudolfix Apr 17, 2024
50c6f2b
moves unc test to local so it runs on windows
rudolfix Apr 17, 2024
c045539
Merge branch 'devel' of https://github.com/dlt-hub/dlt into windows_p…
IlyaFaer Apr 18, 2024
57a6b47
Merge branch 'windows_path_fix' of https://github.com/dlt-hub/dlt int…
IlyaFaer Apr 18, 2024
2f1346c
review fixes
IlyaFaer Apr 18, 2024
0b8e04e
test UNC path reading
IlyaFaer Apr 22, 2024
7c1bdfb
add docs
IlyaFaer Apr 23, 2024
e7a893f
lint ignores
IlyaFaer Apr 23, 2024
fbcffbb
lint fix
IlyaFaer Apr 23, 2024
f111714
lint fix
IlyaFaer Apr 23, 2024
7c0a918
excess ignore
IlyaFaer Apr 23, 2024
5232ce6
lint fix
IlyaFaer Apr 23, 2024
98b985f
Merge branch 'devel' into windows_path_fix
rudolfix Apr 27, 2024
e0ef3a9
adds utils to convert from file to local path to filesystem config
rudolfix Apr 27, 2024
d983976
offloads globbing to Python glob for locat paths, allows UNC and nati…
rudolfix Apr 27, 2024
3810f80
allows the test file bucket to be relative path, fixes tests, adds se…
rudolfix Apr 27, 2024
44cca36
bumps dev adlfs version up to have working glob
rudolfix Apr 27, 2024
23aee6d
updates filesystem docs
rudolfix Apr 27, 2024
556a256
fixes UNC to file uri
rudolfix Apr 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 18 additions & 6 deletions dlt/common/storages/fsspec_filesystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,8 +210,10 @@ def open( # noqa: A003
elif compression == "disable":
compression_arg = None
else:
raise ValueError("""The argument `compression` must have one of the following values:
"auto", "enable", "disable".""")
raise ValueError(
"""The argument `compression` must have one of the following values:
"auto", "enable", "disable"."""
)

opened_file: IO[Any]
# if the user has already extracted the content, we use it so there is no need to
Expand Down Expand Up @@ -282,12 +284,16 @@ def glob_files(
# this is a file so create a proper file url
bucket_url = pathlib.Path(bucket_url).absolute().as_uri()
bucket_url_parsed = urlparse(bucket_url)

bucket_url_no_schema = bucket_url_parsed._replace(scheme="", query="").geturl()
bucket_url_no_schema = (
bucket_url_no_schema[2:] if bucket_url_no_schema.startswith("//") else bucket_url_no_schema
)
filter_url = posixpath.join(bucket_url_no_schema, file_glob)

if filter_url.startswith(r"/\\"):
filter_url = filter_url[1:]

glob_result = fs_client.glob(filter_url, detail=True)
if isinstance(glob_result, list):
raise NotImplementedError(
Expand All @@ -302,10 +308,16 @@ def glob_files(
# make that absolute path on a file://
if bucket_url_parsed.scheme == "file" and not file.startswith("/"):
file = f"/{file}"
file_name = posixpath.relpath(file, bucket_url_no_schema)
file_url = bucket_url_parsed._replace(
path=posixpath.join(bucket_url_parsed.path, file_name)
).geturl()

if file.startswith("//"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we get files starting with //? is this UNC path? please comment on that

Copy link
Contributor Author

@IlyaFaer IlyaFaer Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's legacy, but I think we should consider it as equal to \\. Because... well, dealing with slash direction is very unpleasant for users.

Anyway, I don't think it's needed anymore. glob seems to be handling fine all the possible cases.

file_name = file.replace("//", "\\\\")
file_url = "file://" + file_name
IlyaFaer marked this conversation as resolved.
Show resolved Hide resolved
else:
file_name = posixpath.relpath(file, bucket_url_no_schema)

file_url = bucket_url_parsed._replace(
path=posixpath.join(bucket_url_parsed.path, file_name)
).geturl()

mime_type, encoding = guess_mime_type(file_name)
yield FileItem(
Expand Down
Loading