-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convenience functions and behavior modification of Folder object #75
Comments
These all sound like useful additions excepting the load method. I'm a bit hesitant to have circular dependencies between the _hl and _apps code. I'd suggest adding tests to the existing test_folder for now. We can refactor if it gets unwieldy large. I'm afraid the current code base is not very consistent as far as docstrings go. Anyway, let's follow the format that h5py uses. |
Ok, I can skip the load method for now until we figure out a better strategy. It might be that _apps can be refactored at some point to move the functional capabilities over to _hl so that _apps largely consists of business logic for the specific scripts. |
For the |
I've also seen multiple cases where a |
Can't the info returned by the GET in the constructor be used for info? I'm thinking about enabling h5pyd to support direct access to Files, but we can deal with that later. I've attempted (with lots of omissions) to use the word "Domain" rather then "File" so that HSDS domains are not confused with HDF5 posix files. If we have a subFile method in Folder that might cause more confusion. Rather than a new method, how about an optional parameter on subdomains (e.g. "include_folders=True")? Re: the root Folder - it's a bit of a special case. It has no settable properties other than it's subfolders (there's no JSON object that corresponds to the root Folder). So I think the following will need to be handled in code:
|
Ok, I've got the constructor storing the relevant information from the GET into hidden attributes that get called from It seems that there are two competing syntax issues - one between HDF5 posix files and HSDS domains/files and another between H5PYD domains (Files) and HSDS domains (Files and Folders). I have the
I'm not sure I understand this paragraph. I was under the assumption that h5py is used for direct POSIX access and h5pyd is used for HTTP. Is the thought that the interfaces/packages may merge in the future? I'm still unsure of whether a Folder should instantiate if the object at the given domain is not of class folder (lines 166-168 in folders.py). I'd have to update some of what I've made to handle this scenario, but I'm unsure of whether this is intended behavior or the use case. |
About h5pyd direct access... Currently h5pyd get data by making requests to an HSDS (or h5serv) server. In turn the HSDS server is reading data in the object storage schema (https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md). There are a lot of advantages to this type of client-server architecture, but there are some challenges to. E.g. you need to scale the server to match the load from the clients. Direct access would add a capability to h5pyd so that it could directly read from the storage medium. You will (hopefully) get the same result but without having to make any server requests. If you are running a job with 100's or 1000's of workers, it may be more practical to run these using the direct access model vs. accessing via the server. All this is a bit orthogonal from how the schema objects are actually stored (POSIX Files vs Object Storage objects). HSDS currently requires an object storage system, but it wouldn't be much work to enable the schema to be stored in a POSIX filesystem as well. |
Maybe too late for this but would prefer dropping "sub" from "subfolders" and "subdomains". |
I'm fine with that considering it's hitting the |
Considering the naming conventions I'm leaning towards just making a property for |
After using h5pyd for a while, I'd like to propose a few new functions and changes to the Folder object. I believe these changes allow for more fluid file management programs and match the h5py syntax better.
New functions/properties:
Folder.create_subfolder(subfolder_name, **kwargs):
Folder.create_subdomain(subdomain_name, **kwargs):
Folder.load(local_filepath/filepaths, **kwargs):
Folder.subfolders:
=> propertyFolder.subdomains:
=> propertyFolder.subfolders:
=> propertyFolder.info
=> propertyFolder.__repr__()
is currently returning, i.e. dict of object informationChanges to existing functions:
Folder.__getitem__(path):
x
could index intox['path/to/my/data.h5']
Folder.__str__():
Folder.domain
orFolder.info
, not sure about this one.Folder.__repr__():
<HSDS folder {full_url} (mode {mode})>
to match standard h5py/p5pyd conventionFolder.parent:
=> propertyI can build a branch in my fork incorporating these changes along with general SOH updates like PEP8 compliance and standardized docstrings. Is there a specific docstring style that is preferred?
As for testing, should I continue to add lines into the three existing tests or would you prefer a more granular approach, i.e. a test case for each individual function? The current tests don't necessarily need to be removed, but I would think a more granular approach would highlight errors more clearly.
The text was updated successfully, but these errors were encountered: