-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(fix): ensure zip directory store compares key to prefix correctly #2758
base: main
Are you sure you want to change the base?
Conversation
@@ -271,7 +271,7 @@ async def list_dir(self, prefix: str) -> AsyncIterator[str]: | |||
yield key | |||
else: | |||
for key in keys: | |||
if key.startswith(prefix + "/") and key != prefix: | |||
if key.startswith(prefix + "/") and key.strip("/") != prefix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a TON of this kind of thing already in this codebase. We need these things defined in the IO layer generally, not in individual implementations, as they all face similar problems.
Similarly, do we actually need a ZIP implementation when fsspec can do this for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there have been a LOT of these string parsing bugs. it's a weak point in the codebase, and we definitely need something more solid!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fsspec knows how you feel.
We can come up with a small number of string normalising functions that live on the Store baseclass, something like os.path
, and normalise all user-passed strings at the first opportunity. This is what fsspec's _strip_protocol attempts, but also has problems.
def norm_path(s):
# this probably adds not insignificant runtime cost
return re.sub("/+", "/", s.lstrip("/").rstrip("/"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a weak point in the codebase
I did wonder how widespread this could be. I noticed this sort of thing in other stores but wasn't sure if there was knowledge that they were kosher (for whatever reason) within the maintainers (and therefore didn't need this sort of fix).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration with backend IO libraries (fsspec and probably objstore) is particularly thorny, since they do their own path munging!
can you add a test that would have failed on |
Fixes #2757
Since there was code there previously, I presume it was somehow possible to hit the condition although I haven't figured out how (outside of the reproducer file that I have).
Here is how I would in theory go about it:
But it only yields the subkey
[faz]
instead of['bar', 'faz']
as in the linked issue/file.UPDATE: I think the file was created by simply zipping an old zarr store, but I can't be certain. In any case I checked using
unzip -v /Users/ilangold/Projects/Theis/anndata/tests/data/archives/v0.7.0/adata.zarr.zip
and saw that folders are indeed listed there so I do think this is a real possibility.TODO:
docs/user-guide/*.rst
changes/