feat: Support multiple providers for Remote Taskfiles and support directories #1774

pbitty · 2024-08-27T18:20:59Z

This PR aims to support multiple providers for Remote Taskfiles using go-getter, as I proposed in this comment in [#1317]. It is currently marked as Draft as I aim to get some early feedback, and iterate on it if we want to go forward with this approach or something similar.

This would support the following shapes of URLs as provided by go-getter (non-exhaustive):

URL	Description
`https://example.com/Taskfile.yaml`	File over HTTP
`https://example.com/some-archive.zip`	Directory from ZIP archive over HTTP
`github.com/go-task/task//testdata/includes/Taskfile.yml?ref=some_branch`	Directory from Github ref
`[email protected]:repo/path`	Directory from any Git repository (via local `git` executiy using SSH)
`my-bucket.s3.amazonaws.com/path/to/Taskfile.yml`	File from S3
`my-bucket.s3.amazonaws.com/path/to/folder/`	Directory from S3
`my-bucket.s3.amazonaws.com/path/to/archive.zip`	ZIP file from S3
`my-bucket.s3.amazonaws.com/path/to/archive.zip?taskfile=AnotherTaskfile.yml`	ZIP file from S3 with non-standard entrypointTaskfile

The aims of the change are as follows:

To support multiple source providers for remote taskfiles
To provide a well-defined syntax for source URLs that can support future providers if desired
To support downloading directories containing remote taskfiles and supporting files, not only single taskfiles
As an extension, we support downloading directories in the form of compressed archives (eg. zip, tgz, etc.) thanks to go-getter

These are not directly tied to go-getter but they are satisfied by the library fairly well.

Some points of note with the current implementation:

There doesn't seem to be a need for multiple remote nodes - as long as one satisfies the Getter and Detector interfaces, one can plug a new provider into RemoteNode.
The contract between Node and the rest of the code is that the Node places its content in a directory and provides FileDirectory and Filename, which are used for accessing the Taskfile and Taskfile directory and to set variables like .TASKFILE_DIR.
FileNode could eventually be replaced by logic in RemoteNode as well, if we'd like. go-getter supports local files by using symlinks and we could leverage this. I'm just not sure how well this would on Windows - I would be glad to test it.
The interaction between Node, Reader and Cache feels a bit confusing right now in terms of their lifecycle. It feels like we read the node contents a bit too late in the game, and that we could read the contents and deal with caching before we return a Node to the Reader. I didn't try to change this yet - I wanted to get some eyes on the overall idea, then make a plan to refactor it - either in this PR or a future one.

ie: instead of calling NewNode() in the reader and then later call node.Read(), we would call something like r.loadNode() which already returns a fully loaded Node either from the cache or from remote. In other words, node initialization would include "reading". This would allow us to return a node with an immutable FileDirectory. Currently we have to replace the node if we end up loading its content from the cache because the directory changes.

This is also a big PR and I'd be happy to break it up into more manageable pieces if there's consensus. For example, some test and node/cache changes could be done first before adding go-getter.

What do you all think about the above?

This data will be used by tests in the following commit, which will clone this branch to test git remotes.

pbitty · 2024-08-27T21:42:29Z

I realize this PR is pretty big and I am making many decisions here. If you'd like to talk about this in real-time perhaps on Discord, let me know!

pbitty · 2024-09-09T19:33:49Z

This was meant to be marked as Draft from the start - I only noticed it now.

mgbowman · 2024-09-10T17:02:57Z

+1 from me as I desperately need GitLab support. Currently there’s no way to provide a GITLAB_TOKEN for repos that require authentication when using remote taskfiles.

pbitty · 2024-09-10T17:25:29Z

taskfile/node_remote.go

+	// NOTE: Uses the directory of the entrypoint (Taskfile), not the current working directory
+	// This means that files are included relative to one another
+	entrypointDir := filepath.Dir(r.Dir())
+	return filepathext.SmartJoin(entrypointDir, path), nil


I'm not sure if this is correct. I am asking about it in #1757 (comment).

This type implements `error` as a pointer. If the type passed in is not a pointer to a pointer, `errors.As()` panics.

Without this, FileNode things that `github.com/org/repo` is a file location and it attempts to map it relative to itself. Something tells me we should integrate both node types into some kind of shared type, but for now this works.

pbitty · 2024-09-12T16:08:23Z

I don't love the way this PR turned out. I see two areas that are awkward right now:

How the FileNode and RemoteNode integrate with each other

Each node needs to be able to handle relative include paths in its own includes via ResolveEntrypoint. This requires that the FileNode be aware of remote nodes, and vice-verse. I have to investigate this a bit.

One idea is to have a single node type that handles file and remote. Another is to refactor this more and separate the path resolution aspect from the download aspect - although I'm not sure how much they can be separate since they depend on each other.

The sequence between node loading and caching

Currently I am storing remote node data in a temp directory until the user accepts it, then we move it to the cache. This means the node's "source" data points to the temp directory and we must reload it if we end up using the cached version (here). I'm trying to think if there's simpler logic there.

Caching behaviour in general

With the ability to download directories, this opens the door to larger downloads. The behaviour of automatically downloading every time feels a bit heavy. I wonder if we could have an option to change the default so that once a remote include is cached, we will always use the cache. For people using versioned refs (eg. v1.2.3) this shouldn't be an issue since the ref content is expected to be immutable.

In this case two options I see are:
a) Decide to change the default to always use the cache if it's there and rely on --download to force a refresh
b) Add an env var (or CLI arg or both) that allows us to invert the behaviour

c-ameron · 2024-09-16T13:21:07Z

Thanks for your work on this! I would definitely be keen for an option to use the cache by default. To me it ties in nicely to this other feature request I created #1402

pbitty added 8 commits August 27, 2024 14:17

Add remote taskfile HTTP tests

1488165

Add test data

101162f

This data will be used by tests in the following commit, which will clone this branch to test git remotes.

Add git remote tests

1940bdb

Disable remote caching tests

e1d5aa4

Add go-getter implementation

fa07a9b

Add caching support for remote directories

86f986f

TEMP commit: applying caching refactor from go-task#1771

4e7a2f3

Make source location consistent between cached and uncached nodes

695ff73

pbitty marked this pull request as draft September 9, 2024 19:33

pbitty added 2 commits September 10, 2024 12:40

Merge branch 'main' of github.com:go-task/task into go-getter-support

a775d21

Incorporate changes from c77c8a4

3152780

pbitty mentioned this pull request Sep 10, 2024

Remote Taskfiles experiment #1317

Open

15 tasks

pbitty commented Sep 10, 2024

View reviewed changes

pbitty added 3 commits September 11, 2024 15:14

Fix error type detection

a48b35f

This type implements `error` as a pointer. If the type passed in is not a pointer to a pointer, `errors.As()` panics.

Add remote include detection to FileNode

00f64d4

Without this, FileNode things that `github.com/org/repo` is a file location and it attempts to map it relative to itself. Something tells me we should integrate both node types into some kind of shared type, but for now this works.

Comment and tidy-up

70048b5

blackjid mentioned this pull request Sep 24, 2024

feat(remote): support include git remote #1652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support multiple providers for Remote Taskfiles and support directories #1774

feat: Support multiple providers for Remote Taskfiles and support directories #1774

pbitty commented Aug 27, 2024 •

edited

Loading

pbitty commented Aug 27, 2024

pbitty commented Sep 9, 2024

mgbowman commented Sep 10, 2024

pbitty Sep 10, 2024

pbitty commented Sep 12, 2024

c-ameron commented Sep 16, 2024

feat: Support multiple providers for Remote Taskfiles and support directories #1774

Are you sure you want to change the base?

feat: Support multiple providers for Remote Taskfiles and support directories #1774

Conversation

pbitty commented Aug 27, 2024 • edited Loading

pbitty commented Aug 27, 2024

pbitty commented Sep 9, 2024

mgbowman commented Sep 10, 2024

pbitty Sep 10, 2024

Choose a reason for hiding this comment

pbitty commented Sep 12, 2024

How the FileNode and RemoteNode integrate with each other

The sequence between node loading and caching

Caching behaviour in general

c-ameron commented Sep 16, 2024

pbitty commented Aug 27, 2024 •

edited

Loading