Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of '::' in file's name #1782

Open
maxgalli opened this issue Jan 28, 2025 · 0 comments
Open

Use of '::' in file's name #1782

maxgalli opened this issue Jan 28, 2025 · 0 comments

Comments

@maxgalli
Copy link

I am investigating this issue in uproot, where we try to open a file that contains :: in the name of the file. As you can see, the expected behavior would be to correctly create the file indicated in the string, but it doesn't seem to be the case. The reason is in this function

def _un_chain(path, kwargs):
# Avoid a circular import
from fsspec.implementations.cached import CachingFileSystem
if "::" in path:
x = re.compile(".*[^a-z]+.*") # test for non protocol-like single word
bits = []
for p in path.split("::"):
if "://" in p or x.match(p):
bits.append(p)
else:
bits.append(p + "://")
else:
bits = [path]
# [[url, protocol, kwargs], ...]
out = []
previous_bit = None
kwargs = kwargs.copy()
for bit in reversed(bits):
protocol = kwargs.pop("protocol", None) or split_protocol(bit)[0] or "file"
cls = get_filesystem_class(protocol)
extra_kwargs = cls._get_kwargs_from_urls(bit)
kws = kwargs.pop(protocol, {})
if bit is bits[0]:
kws.update(kwargs)
kw = dict(
**{k: v for k, v in extra_kwargs.items() if k not in kws or v != kws[k]},
**kws,
)
bit = cls._strip_protocol(bit)
if "target_protocol" not in kw and issubclass(cls, CachingFileSystem):
bit = previous_bit
out.append((bit, protocol, kw))
previous_bit = bit
out.reverse()
return out

where, it seems to me, the case of :: being part of the file name is not considered, and it is only treated as a protocol separator.

One idea to adapt the code would be the following:

    if "::" in path:
        x = re.compile(".*[^a-z]+.*")  # test for non protocol-like single word
        bits = []
        for p in path.split("::"):
            # Check if part looks like a protocol or URL
            if "://" in p or x.match(p) or p in known_implementations:
                bits.append(p)
            else:
                # If not, assume it is part of the file name
                bits.append(p + "://")
        
        # If no part matches a known protocol, treat the entire path as a file name
        if not any(b for b in bits if b.strip("://") in known_implementations):
            bits = [path]
    else:
        bits = [path]

This fixes Jim's reproducer, but breaks a few tests, making me wonder if this behavior is intentional.

The question is thus the following: should a logic be implemented in fsspec to handle the case in which :: is part of the file name or should we implement a check in uproot where we raise an error if :: is not used as a protocol separator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant