Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe versioned TRS URIs #202

Merged
merged 3 commits into from
Aug 4, 2022
Merged

Describe versioned TRS URIs #202

merged 3 commits into from
Aug 4, 2022

Conversation

uniqueg
Copy link
Collaborator

@uniqueg uniqueg commented Aug 25, 2021

Rationale

TRS URIs allow passing references to TRS content to potential TRS client applications (e.g., WES or TES). This PR adds support for versioned TRS URIs, which are required to specify which version of a specified tool the client application is supposed to use/consume. This is essential to ensure reproducibility for at least some of the main use cases of TRS URIs (e.g., fetching workflows or containers).

Implementation details

  • Extend definition of TRS URIs (trs://<server>/<id>) in TRS Data Model documentation to include versioned TRS URI (trs://<server>/<id>/<version_id>)
  • Add guidelines on defining TRS Tool and TRS Tool Version IDs, based on corresponding description in DRS Data Model documentation
  • Include detailed information on how to encode TRS Tool (Version) IDs containing characters that are invalid during API calls and an example how to construct a valid, verisioned TRS URI from them
  • Moved recommendation to publish deployed TRS services in the centralized GA4GH Service Registry to its own subsection under section Misc

Related issues

Resolves #164 (see issue for additional details and a community discussion of the implementation proposed here)

@uniqueg
Copy link
Collaborator Author

uniqueg commented Aug 25, 2021

@denis-yuen: Please add additional reviewers. Thanks!

@denis-yuen denis-yuen linked an issue Aug 25, 2021 that may be closed by this pull request
For example, if a TRS client was asked to process
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g.,
issue `GET` requests to
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how this will work with versioning?

What happens when TRS v3 comes out and a new product only implements TRS v3 and doesn't implement v2?

That's assuming TRS v3 will use a new path segment v3, which I suppose it doesn't have to. But I assume it would.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially very primitive content negotiation where the user agent would need to try the API versions it would be familiar with?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think versioning APIs is an important topic and I think there's most definitely a need for discussion and harmonization across GA4GH APIs (e.g., Stian mentioned HATEOAS in #160), but as this PR does not introduces the issue nor is designed to address it, I don't think it should be blocked by it.

Besides, I don't think that TRS URIs, as identifiers of resources, should be concerned with API versioning at all: IMO, when bumping API versions, care must be taken (both on the spec- and implementation-side) not to introduce ambiguities in the way that the different responses in, e.g., TRS v2 and v3 can be interpreted by a client. And if changes to the internal representation of resources are required that would break backward compatibility, then I think those resources should be cloned and given new identifiers, such that reproducibility is not negatively affected. As long as these considerations/constraints are honored, it shouldn't matter which API version is used to access a given resource.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I have clarified this/addressed these concerns now in lines 56/57 and 69-72

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was not about API/schema versioning, which is a very complex subject and I'm no expert in. It was about the example that if you have a URI trs://trs.example.org/tool_ABC/v1.2.3, it says you can fetch the content https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC (note the v2 in the path).

If TRS v3 comes along, and a new TRS implementation only supports that v3, then the v2 in the path means they would return a 404. Maybe that's OK, or maybe the requirement is that all TRS implementations need to implement v2 and above. Otherwise a client will need to iterate over paths with all potential versions.

Maybe I'm overthinking this, and it is a real edge case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is an API/schema versioning topic because the API version is precisely what v2 refers to. In any case, there anyway needs to be some sort of negotiation between a service and a client, because not every client will implement every version of the API specs, and so there will be cases where clients and servers won't match, regardless of whether the client will iterate or whether we specify the version in the TRS URI. One possibility to facilitate this negotation would be to define the supported API versions in the service info, but of course, the Service Info API is versioned, too...

I agree this is an important issue and needs to be discussed on a larger scale within the Cloud WS in order to come up with a sensible strategy that covers DRS and TRS URIs as well as other artefacts/problems with API versioning. A related issue I faced is also that the path (e.g., ga4gh/trs/v2) defined in the schema may not always be possible or desirable to implement. For example, as far as I know, Dockstore does not. The path info could perhaps also be defined in the service info for any given service (although the same problem applies: how do you get to the service info in the first place; perhaps via the service registry...).

Still, I don't think this PR makes this problem worse than it is already, given the current definitions of DRS and TRS URIs, so I don't think it should block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is an API/schema versioning topic

Fair. I was thinking more of different schemas in different versions of the API when I made my comment, but you're right.

Still, I don't think this PR makes this problem worse than it is already, given the current definitions of DRS and TRS URIs

I think the explicitness that you can take a URI and translate it to a specific path is a new wrinkle, at least for TRS. You're right about DRS -- I hadn't looked until now and they do the same thing, with v1 in the path.

so I don't think it should block.

I had already approved, so it is not blocked.

Copy link
Collaborator Author

@uniqueg uniqueg Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @coverbeck

So @denis-yuen, others: shall we bring up this important discussion to the Cloud WS?

Also perhaps to see if people agree that it might be useful to specify TRS URIs as inputs to WES (to access workflows) and inside workflows (to access containers through WES/TES implementations)? Because this might require changes on WES and TES specs as well, without which TRS URIs might not be all that useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal at Cloud WS was to break this discussion into a separate ticket #221

For example, if a TRS client was asked to process
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g.,
issue `GET` requests to
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially very primitive content negotiation where the user agent would need to try the API versions it would be familiar with?

For example, if a TRS client was asked to process
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g.,
issue `GET` requests to
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was not about API/schema versioning, which is a very complex subject and I'm no expert in. It was about the example that if you have a URI trs://trs.example.org/tool_ABC/v1.2.3, it says you can fetch the content https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC (note the v2 in the path).

If TRS v3 comes along, and a new TRS implementation only supports that v3, then the v2 in the path means they would return a 404. Maybe that's OK, or maybe the requirement is that all TRS implementations need to implement v2 and above. Otherwise a client will need to iterate over paths with all potential versions.

Maybe I'm overthinking this, and it is a real edge case.

@mr-c
Copy link
Contributor

mr-c commented Sep 15, 2021

On today's ELIXIR Cloud & AAI call we discussed how this would interact with CWL's container specification (typically the dockerPull attribute of DockerRequirement).

If this proposal is adopted (for which I have nothing against) then I would like it to include specific guidance on implementation, especially in a user-facing (CWL) context.

For CWL, I suggest putting versioned TRS URIs (if needed) under the "specs" field of SoftwarePackage.

However, as SoftwarePackage can already contain tool names and identifiers (such as from bio.tools and others), then a TRS URI could be automatically constructed on behalf of the user in such circumstances. If others agree, then I think this proposal should suggest such an automatic construction on behalf of users when possible and give this specific example.

@jmfernandez
Copy link

Very good point @coverbeck . From my humble point of view, a future iteration of GA4GH TRS should consider either recommending/enforcing a fallback answer for selected v2 routes, or implementing both future and (then old) v2 paths.

@denis-yuen
Copy link
Member

If TRS v3 comes along, and a new TRS implementation only supports that v3, then the v2 in the path means they would return a 404. Maybe that's OK, or maybe the requirement is that all TRS implementations need to implement v2 and above. Otherwise a client will need to iterate over paths with all potential versions.

Thinking this through. In practice, the client would hit https://github.com/ga4gh/tool-registry-service-schemas/blob/2.0.1-beta.0/openapi/openapi.yaml#L24 once for each version of the API that the client supports from highest version to lowest version to see what is active. If it treats all requests as independent, it would indeed need to do this check each time.

It kinda feels like we should lift a page out of something existing like https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation but its not something I've worked with.

@coverbeck
Copy link
Contributor

It kinda feels like we should lift a page out of something existing like https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation but its not something I've worked with.

I've heard of it, but I've actually played with it. I wonder how well that works with Swagger generated code?

Also, you'd still need a url to negotiate with, so maybe there's an endpoint that doesn't include the version?

@denis-yuen
Copy link
Member

denis-yuen commented Sep 21, 2021

Well, the swagger bit is a Dockstore problem, not a TRS problem 👿
Assuming openapi 3.0 supports that kind of idea.

The versionless endpoint is interesting, I do wonder in retrospect whether service-info is supposed to be nested inside the versioned API and implemented once per version or whether there should be one service-info for all versions. Going to ping them

Edit to add: Another idea might be to ping the GA4GH registry to figure out which versions of what APIs are available
https://github.com/ga4gh/ga4gh-registry but that does add a dependency

@uniqueg
Copy link
Collaborator Author

uniqueg commented Sep 21, 2021

@mr-c:

I agree that guidance on implementation in CWL etc. would be nice. However, it seems to me that this would be something that should rather be documented on the implementers' side, because it is their user base that would need to be aware of these.

In any case, I would probably wait a bit with this to see how this is actually being taken up and what problems might arise from using TRS URIs instead of specific container repo URIs or when specified, as you suggest, in the SoftwarePackage field. For what it's worth, specifying a TRS URI in dockerPull worked perfectly fine in cwl-tes and TESK. Neither the CWL nor the TES validators in cwl-tes and TESK complained as the types (as we had checked for CWL at least, during the call) are compatible. However, the use of TRS URIs for allowing TES implementations to pull their container image of choice is something that would need to be explicitly documented in the TES specs since it is of course necessary to implement a TRS URI parser and TRS client to be able to process these (we did this in TESK).

As for clients automatically constructing TRS URIs: the client has no way of knowing resource IDs (tool and version) just by knowing tool names, as TRS implementations are free to choose these identifiers. Mandating TRS implementations to use specific identifiers (e.g., semantic identifiers) would go too far, IMO, although implementations could of course choose to go that route (and deal with the ambiguities that this may bring along). But even then, I think clients guessing container images is potentially dangerous. But even if all of that could be somehow addressed satisfactorily, how would the client know/guess the domain/server address at which the API/resource is hosted?

@denis-yuen
Copy link
Member

@uniqueg
From GA4GH Connect, are there any blockers/updates for this?
Wondering if this can be merged (for a potential 2.0.1)

@uniqueg
Copy link
Collaborator Author

uniqueg commented May 5, 2022

@denis-yuen: I guess one just came in from @patmagee in the corresponding issue: #164 (comment)

For example, if a TRS client was asked to process
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g.,
issue `GET` requests to
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uniqueg This example is a bit awkward and likely points to a deficiency in the current approach for defining trs identifiers (and potentially drs identifiers as well). The TES specification defines the root path of the API as /ga4gh/trs/<version>.

Putting aside specific trs versioning issues (from the conversation above), I am now left with another dilema: TRS identifiers as defined only provide the hostname and a toolId and possibly a version id. This approach does not leave ANY room for path values that may come before the /ga4gh/... prefix. Given this, how am I supposed to resolve the actual trs URL for the example?

By convention, any tooling I create would resolve trs://trs.example.org/tool_ABC/v1.2.3 to https://trs.example.org/ga4gh/trs/v2/tools/tool_ABC/versions/v1.2.3, which is not what is presented in this example (note the missing API). I would not be so concerned over this if it was not already present in the wild in the Dockstore api (@denis-yuen). So my question then is how can we possibly resolve the path components without introducing ambiguity into the trs identifier.

One approach could be just adding the prefix paths prior to the toolId trs://trs.example.org/api/tool_ABC. Unfortuantely this is ambiguous since this could be referring to either a tool with the path prefix api and version tool_ABC or a tool tool_ABC with a path api. If we do not allow the path prefix to be in the identifier, any tooling is limited to the trs implementations that It already knows how to resolve, effectively limiting the usefulness of the trs id in general.

We essentially need a reliable way to extract the toolId and versionId without any ambiguity, while still allowing preceding fragments in the path. Some examples are below:

trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>:<versionId>
trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>/v/<versionId>
trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>/version/<versionId>

Copy link
Member

@denis-yuen denis-yuen May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could probably calculate the prefixes on the fly from https://github.com/ga4gh/tool-registry-service-schemas/blob/develop/registry.json but it probably does make more sense to change the URI format since we're not using it yet(?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denis-yuen I would advise against relying on prefixes in the registry.json file, which automatically enforces that file is the authoratative source of all tool registries. It does not really allow for new registries to dynamically be created (ie in a multitenant system)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my personal preference for approaches would likely be: trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>:<versionId>. The side effect of this is that : is now a restricted character that cannot be used in the identifier itself, but I think that is a reasonable restriction

Copy link
Collaborator Author

@uniqueg uniqueg May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr

I agree with @patmagee's suggestion provided that DRS folks agree that it will work for them as well.


Good catch @patmagee - thanks a lot! I had actually realized this issue before when implementing a TRS client that understands TRS URIs and having to introduce an optional base_path parameter so that it would work with Dockstore. It seems I had forgotten about it though. And I agree, the TRS URI loses its utility if this has to be known beforehand.

From what I can see, @patmagee's suggestion should do the trick.

But thinking this through a little more, maybe there are a couple of things that we should discuss outside of this issue before merging:

  • To make sure that whatever is decided here has the perspective of aligning with whatever is decided for DRS URIs in the future, I think this discussion should include the DRS community. @patmagee's other worry of being able to include basically any character in TRS Tool/Version IDs could be brought up there as well.
  • I am not quite sure whether API versioning could have any effect on a client's ability to properly resolve a TRS URI. I feel that it should not. But without explicit guidelines on API versioning, outlining expected behaviors and discussing their impact on reproducibility, I feel that it is hard to be sure. So perhaps it might be worth to discuss specifying such guidelines somewhere, possibly in a place that includes other work streams as well.

Besides that, it is possibly a good idea to aim for a consensus of how servers are defined in the various Cloud WS APIs in the mid to long term. And whatever is decided here, should not collide with our definitions of TRS and DRS URIs. Currently, the situation is like this:

TRS

servers:
  - url: /ga4gh/trs/v2

DRS

servers:
  - url: https://{serverURL}/ga4gh/drs/v1
    variables:
      serverURL:
        default: drs.example.org
        description: >
          DRS server endpoints MUST be prefixed by the '/ga4gh/drs/v1' endpoint
          path

WES

servers:
- url: https://{defaultHost}/{basePath}/{apiVersion}
  variables:
    defaultHost:
      default: www.example.com
    basePath:
      default: ga4gh/wes
    apiVersion:
      default: v1

TES

servers:
- url: /ga4gh/tes/v1

So, it looks like TRS and TES do not care where the API is deployed, as long as endpoint paths are immediately preceded by /ga4gh/trs/v2 and /ga4gh/tes/v1, respectively. DRS is very similar but a little more prescriptive in that it requires the use of the https schema. WES, on the other hand, requires https as well, but otherwise only seems to require that there be at least two slashes included in the URL before the endpoint path (weird?). In particular, it does not require a specific base path suffix like /ga4gh/wes/v1 (though the default does include it).

Now, there is currently no consensus across those definitions. But perhaps it seems to be going in the direction of the DRS definition, i.e., start with https and end with a specific base path suffix of the form /ga4gh/{api_short_name}/{api_version}. @patmagee's definition would account for that (having the client fill in the appropriate base path suffix). But even with a more relaxed definition that does not prescribed a specific base path for GA4GH Cloud APIs, the suggested solution should still work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal at Cloud WS was to also break this into a separate ticket, but to prioritize it for discussion before 2.0.1 release and/or the GA4GH plenary
#220

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposing wrapping this specific PR so we can get the language about versioned TRS URIs in (describing versions of tools described by TRS), but continuing the discussion for the three other issues the the following tickets since the discussion seems to be sprawling out:

For example, if a TRS client was asked to process
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g.,
issue `GET` requests to
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal at Cloud WS was to break this discussion into a separate ticket #221

For example, if a TRS client was asked to process
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g.,
issue `GET` requests to
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal at Cloud WS was to also break this into a separate ticket, but to prioritize it for discussion before 2.0.1 release and/or the GA4GH plenary
#220

@uniqueg
Copy link
Collaborator Author

uniqueg commented Jun 20, 2022

Fine with that :)

@denis-yuen denis-yuen merged commit 892f743 into gh-pages Aug 4, 2022
@denis-yuen denis-yuen deleted the versioned_trs_uris branch August 4, 2022 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add versioned TRS URI to documentation
7 participants