Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support multiple providers for Remote Taskfiles and support directories #1774

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

pbitty
Copy link
Contributor

@pbitty pbitty commented Aug 27, 2024

This PR aims to support multiple providers for Remote Taskfiles using go-getter, as I proposed in this comment in [#1317]. It is currently marked as Draft as I aim to get some early feedback, and iterate on it if we want to go forward with this approach or something similar.

This would support the following shapes of URLs as provided by go-getter (non-exhaustive):

URL Description
https://example.com/Taskfile.yaml File over HTTP
https://example.com/some-archive.zip Directory from ZIP archive over HTTP
github.com/go-task/task//testdata/includes/Taskfile.yml?ref=some_branch Directory from Github ref
[email protected]:repo/path Directory from any Git repository (via local git executiy using SSH)
my-bucket.s3.amazonaws.com/path/to/Taskfile.yml File from S3
my-bucket.s3.amazonaws.com/path/to/folder/ Directory from S3
my-bucket.s3.amazonaws.com/path/to/archive.zip ZIP file from S3
my-bucket.s3.amazonaws.com/path/to/archive.zip?taskfile=AnotherTaskfile.yml ZIP file from S3 with non-standard entrypointTaskfile

The aims of the change are as follows:

  • To support multiple source providers for remote taskfiles
  • To provide a well-defined syntax for source URLs that can support future providers if desired
  • To support downloading directories containing remote taskfiles and supporting files, not only single taskfiles
  • As an extension, we support downloading directories in the form of compressed archives (eg. zip, tgz, etc.) thanks to go-getter

These are not directly tied to go-getter but they are satisfied by the library fairly well.


Some points of note with the current implementation:

  • There doesn't seem to be a need for multiple remote nodes - as long as one satisfies the Getter and Detector interfaces, one can plug a new provider into RemoteNode.

  • The contract between Node and the rest of the code is that the Node places its content in a directory and provides FileDirectory and Filename, which are used for accessing the Taskfile and Taskfile directory and to set variables like .TASKFILE_DIR.

  • FileNode could eventually be replaced by logic in RemoteNode as well, if we'd like. go-getter supports local files by using symlinks and we could leverage this. I'm just not sure how well this would on Windows - I would be glad to test it.

  • The interaction between Node, Reader and Cache feels a bit confusing right now in terms of their lifecycle. It feels like we read the node contents a bit too late in the game, and that we could read the contents and deal with caching before we return a Node to the Reader. I didn't try to change this yet - I wanted to get some eyes on the overall idea, then make a plan to refactor it - either in this PR or a future one.

    ie: instead of calling NewNode() in the reader and then later call node.Read(), we would call something like r.loadNode() which already returns a fully loaded Node either from the cache or from remote. In other words, node initialization would include "reading". This would allow us to return a node with an immutable FileDirectory. Currently we have to replace the node if we end up loading its content from the cache because the directory changes.

This is also a big PR and I'd be happy to break it up into more manageable pieces if there's consensus. For example, some test and node/cache changes could be done first before adding go-getter.

What do you all think about the above?

@pbitty
Copy link
Contributor Author

pbitty commented Aug 27, 2024

I realize this PR is pretty big and I am making many decisions here. If you'd like to talk about this in real-time perhaps on Discord, let me know!

@pbitty pbitty marked this pull request as draft September 9, 2024 19:33
@pbitty
Copy link
Contributor Author

pbitty commented Sep 9, 2024

This was meant to be marked as Draft from the start - I only noticed it now.

@mgbowman
Copy link

+1 from me as I desperately need GitLab support. Currently there’s no way to provide a GITLAB_TOKEN for repos that require authentication when using remote taskfiles.

@pbitty pbitty mentioned this pull request Sep 10, 2024
15 tasks
Comment on lines +127 to +130
// NOTE: Uses the directory of the entrypoint (Taskfile), not the current working directory
// This means that files are included relative to one another
entrypointDir := filepath.Dir(r.Dir())
return filepathext.SmartJoin(entrypointDir, path), nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is correct. I am asking about it in #1757 (comment).

This type implements `error` as a pointer.  If the type passed in is
not a pointer to a pointer, `errors.As()` panics.
Without this, FileNode things that `github.com/org/repo` is a file
location and it attempts to map it relative to itself.

Something tells me we should integrate both node types into some kind
of shared type, but for now this works.
@pbitty
Copy link
Contributor Author

pbitty commented Sep 12, 2024

I don't love the way this PR turned out. I see two areas that are awkward right now:

How the FileNode and RemoteNode integrate with each other

Each node needs to be able to handle relative include paths in its own includes via ResolveEntrypoint. This requires that the FileNode be aware of remote nodes, and vice-verse. I have to investigate this a bit.

One idea is to have a single node type that handles file and remote. Another is to refactor this more and separate the path resolution aspect from the download aspect - although I'm not sure how much they can be separate since they depend on each other.

The sequence between node loading and caching

Currently I am storing remote node data in a temp directory until the user accepts it, then we move it to the cache. This means the node's "source" data points to the temp directory and we must reload it if we end up using the cached version (here). I'm trying to think if there's simpler logic there.

Caching behaviour in general

With the ability to download directories, this opens the door to larger downloads. The behaviour of automatically downloading every time feels a bit heavy. I wonder if we could have an option to change the default so that once a remote include is cached, we will always use the cache. For people using versioned refs (eg. v1.2.3) this shouldn't be an issue since the ref content is expected to be immutable.

In this case two options I see are:
a) Decide to change the default to always use the cache if it's there and rely on --download to force a refresh
b) Add an env var (or CLI arg or both) that allows us to invert the behaviour

@c-ameron
Copy link

Thanks for your work on this! I would definitely be keen for an option to use the cache by default. To me it ties in nicely to this other feature request I created #1402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants