Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-47325: Add API for parsing butler dataset URIs (butler and ivo) #1113

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

timj
Copy link
Member

@timj timj commented Nov 1, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

Copy link

codecov bot commented Nov 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.38%. Comparing base (e75d1d5) to head (1a3d455).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1113      +/-   ##
==========================================
+ Coverage   89.37%   89.38%   +0.01%     
==========================================
  Files         362      362              
  Lines       48312    48366      +54     
  Branches     5872     5876       +4     
==========================================
+ Hits        43179    43233      +54     
  Misses       3718     3718              
  Partials     1415     1415              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

return label, dataset_id

@classmethod
def get_dataset_from_uri(cls, uri: str) -> DatasetRef | None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhirving I was thinking that this should take a LabeledButlerFactory as an optional parameter (so that a service like vo-cutouts can call it directly with the URI) but do you have any thoughts on how to pass in the token? In the general case the caller won't know which butler the URI is referring to. Can I assume that **kwargs should be forwarded as factory.create_butler(label=label, **kwargs) and assume the token would be shared? Or should there be a dict indexed by label that has the additional parameters that the service can set up once and pass in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea of a config object that could be set up by the caller and passed through.

One other idea would be a dependency injection pattern. Maybe declare a protocol like

class ButlerFactoryProtocol(Protocol):
    create(label: str) -> Butler

# (or just `Callable[[str], Butler]

which get_dataset_from_uri could take as a parameter. Then LabeledButlerFactory could grow a method bind(access_token: str) -> ButlerFactoryProtocol to create a factory with the access token already bound in. That would let callers do whatever they want to customize the instance creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the access token is in fact shared since it comes from Gafaelfawr and just represents the user's identity. Though no guarantees it will stay they way if we end up needing to delegate authentication between IDF and USDF, or some other unfortunate thing.

Copy link
Member Author

@timj timj Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to think through what you said above. For:

my_factory = factory.bind(access_token=token) 
ref = Butler.get_dataset_from_uri(dataset_uri, my_factory)

doesn't this mean that the access_token is required to be the same for every butler label?

Don't we need:

my_factory = factory.bind("dp02": {"access_token": token}, "dp02-direct": {"access_token": None})

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one access token per HTTP request -- Gafaelfawr generates it to identify the user. It is not associated with the repository in any way. So at least at the moment there's only one access token. If you pass an access token to LabeledButlerFactory for a DirectButler label it just ignores it.

The advantage to the dependency injection pattern is that this code wouldn't have to know anything at all about authentication or where Butlers come from -- it makes that the caller's responsibility. This code just knows that if it has a label, the caller will work out how to get it the matching Butler. So if things get more complicated later, this interface doesn't have to change. (One example might be caching the Butler if someone wants to call get_dataset_from_uri multiple times in the same request -- though that might be better solved by wrapping this function in a separate object.)

Your config object idea would work well too, allowing us to add more options later if LabeledButlerFactory gets more complicated. It's just a little less flexible and it couples this function tightly to LabeledButlerFactory.

can include ``/`` and the leading ``/`` is always stripped. If the
repository label starts with ``/`` then it must be doubled up. e.g.,

ivo://rubin//repo/main/82d79caa-0823-4300-9874-67b737367ee0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpdf do you have any thoughts as to what the netloc field for our IVO IDs might look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants