-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native git support: lsRefs(), sparseCheckout(), GitPathControl #1764
Conversation
e1f28a1
to
3169b88
Compare
680cd19
to
2e376d2
Compare
## Description Adds a Directory resource type to enable loading file trees from git repositories, local hard drive, deeply nested zip archives etc: ```json { "steps": [ { "step": "installPlugin", "pluginData": { "resource": "git:directory", "repositoryUrl": "https://github.com/WordPress/wordpress-playground.git", "ref": "HEAD", "path": "packages/docs" } }, { "step": "installPlugin", "pluginData": { "resource": "literal:directory", "name": "hello-world", "files": { "README.md": "Hello, World!", "index.php": "<?php\n/**\n* Plugin Name: Hello World\n* Description: A simple plugin that says hello world.\n*/", } } } ] } ``` ## Motivation This PR opens the door to: * Seamless Git integration. Import path mapping from git to Playground is now just a few steps referencing specific git directories. No more custom logic required! * Blueprint-based site imports and exports without any Playground webapp-specific logic. * Runtime-specific "resource overrides", e.g. `--resource-override=GUTENBERG:./gutenberg.zip` in CLI to test a Blueprint with a my local version of Gutenberg. The same logic would be used by the Blueprints builder to use files selected via `<input type="file">` controls. ### Schema Every step can declare which kinds of resources it accepts (file-based resources vs directory-based resources). Using a single `pluginData` property in the `installPlugin` step means less choices for the developer. It also makes local resource overrides easy, e.g. we could tell Playground CLI to load a local Gutenberg directory instead of a remote Gutenberg zip. This wouldn't be as easy had we used separate options for passing ZIP-based and directory-based resources. On one hand, `pluginData` is less informative than `pluginZipFile`. On the other, the name accommodates for non-zip resources such as directories. ## Developer notes about specific API changes introduced in this PR This PR changes introduces a new `literal:directory` resource that can be used in Blueprints as follows: ```json { "steps": [ { "step": "installPlugin", "pluginData": { "resource": "literal:directory", "name": "hello-world", "files": { "README.md": "Hello, World!", "index.php": "<?php\n/**\n* Plugin Name: Hello World\n* Description: A simple plugin that says hello world.\n*/", } } } ] } ``` Or via the JS API: ```ts await installTheme(php, { themeData: { name: 'test-theme', files: { 'index.php': `/**\n * Theme Name: Test Theme`, }, }, ifAlreadyInstalled: 'overwrite', options: { activate: false, }, }); ``` It also introduces a new `writeFiles` step: ```ts { "steps": [ { "step": "writeFiles", "writeToPath": "/wordpress/wp-content/plugins/my-plugin", "filesTree": { "name": "my-plugin", "files": { "index.php": "<?php echo '<a>Hello World!</a>'; ?>", "public": { "style.css": "a { color: red; }" } } } } ] } ``` Specific changes: * Adds a `Resource<Directory>` resource type that provides a `name: string` and `files: FileTree`. * Renames `pluginZipFile` to `pluginData` in the `installPlugin` step * Renames `themeZipFile` to `themeData` in the `installPlugin` step * Adds a new `writeFiles` step for writing entire directory trees * Adds a new `literal:directory` resource type where an entire file tree can be specified inline * Adds a new `git:directory` resource type that throws an error for now, but will load arbitrary directories from git repositories once #1764 lands ## Remaining work - [x] Discuss the scope and the ideas - [x] Add unit tests - [x] Update the documentation - [x] Adjust the `installPlugin` and `installTheme` step for compatibility with it's former signature. Ensure the existing packages consuming those functions from the `@wp-playground/blueprints` package will continue to work. - [x] Confirm we can safely omit streaming from the system design at this point without setting ourselves up for a grand refactor a few months down the road. * I think we can! Streaming support could be an addition to the system, not a change in how the system works. For example, there could be a new `DirectoryStream` resource type producing an `AsyncDirectoryIterator` with streamable `File` or `Blob` objects as its leafs. It would work nicely with remote APIs or the ZIP streaming plumbing in `@php-wasm/stream-compression`. Any existing code expecting a `DirectoryResource` should be relatively easily adaptable to use these async iterators instead. ## Follow-up work - [ ] Include actual git support once the [Git sparse checkout PR](#1764) lands - [ ] Ship a Playground CORS proxy to enable using git checkout in the webapp - [ ] Once we have a use-able `git:directory` resource, expand the developer notes from this PR and other related PRs and write a post on https://make.wp.org/playground ## Tangent – Streaming and a shorthand URL notation Without streaming, the entire directory must be loaded into memory. Our git sparse checkout implementation buffers everything anyway, but we will want to stream-read directory resources in the future. For example: ```js { "steps": [ // Stream plugin files directly from a doubly zipped Git artifact { "step": "installPlugin", "pluginData": { "resource": "zip:github-artifact", "zipFile": { "resource": "url", "url": "https://github.com/WordPress/guteneberg/pr/54713/artifacts/build.zip" } } } } } ``` That's extremely verbose, I'd love to explore a shorthand notation. One idea would be to make it a valid [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) shaped after the [data URL syntax](https://developer.mozilla.org/en-US/docs/Web/URI/Schemes/data): ```js const dataUri = `data:text/html;base64,%3Cscript%3Ealert%28%27hi%27%29%3B%3C%2Fscript%3E`; const githubArtifactUri = `zip-github-artifact+url:https://github.com/WordPress/gutenberg/pr/54713/artifacts/build.zip`; const gitResourceUri = `git:branch=HEAD;path=src,https://github.com/WordPress/hello-dolly.git`; ``` It wouldn't allow easy composition of the resources, e.g. a directory inside a zip sourced from a GitHub repo. Maybe that's for the best, though, since such a string would be extremely dense and difficult for humans to read. The object-based syntax might still be the most convenient way of to declare those.
🥁 This took three months to build and merge, but we got there! |
Related to #1787, Follows up on #1793 Implements GitDirectoryResource to enable loading files directly from git repositories as follows: ```ts { "landingPage": "/guides/for-plugin-developers.md", "steps": [ { "step": "writeFiles", "writeToPath": "/wordpress/guides", "filesTree": { "resource": "git:directory", "url": "https://github.com/WordPress/wordpress-playground.git", "ref": "trunk", "path": "packages/docs/site/docs/main/guides" } } ] } ``` ## Implementation details Uses git client functions merged in #1764 to sparse checkout the requested files. It also leans on the PHP CORS proxy which is now started as a part of the `npm run dev` command. The CORS proxy URL is configurable per `compileBlueprint()` call so that each Playground runtime may choose to either use it or not. For example, it wouldn't be very useful in the CLI version of Playground. ## Testing plan Go to `http://localhost:5400/website-server/#{%20%22landingPage%22:%20%22/guides/for-plugin-developers.md%22,%20%22steps%22:%20[%20{%20%22step%22:%20%22writeFiles%22,%20%22writeToPath%22:%20%22/wordpress/guides%22,%20%22filesTree%22:%20{%20%22resource%22:%20%22git:directory%22,%20%22url%22:%20%22https://github.com/WordPress/wordpress-playground.git%22,%20%22ref%22:%20%22trunk%22,%20%22path%22:%20%22packages/docs/site/docs/main/guides%22%20}%20}%20]%20}` and confirm Playground loads a markdown file.
@adamziel after pulling Should we update our local dev instructions or is there something we can automate to make this code available to new developers? |
Related to #1787, Follows up on #1793 Implements GitDirectoryResource to enable loading files directly from git repositories as follows: ```ts { "landingPage": "/guides/for-plugin-developers.md", "steps": [ { "step": "writeFiles", "writeToPath": "/wordpress/guides", "filesTree": { "resource": "git:directory", "url": "https://github.com/WordPress/wordpress-playground.git", "ref": "trunk", "path": "packages/docs/site/docs/main/guides" } } ] } ``` ## Implementation details Uses git client functions merged in #1764 to sparse checkout the requested files. It also leans on the PHP CORS proxy which is now started as a part of the `npm run dev` command. The CORS proxy URL is configurable per `compileBlueprint()` call so that each Playground runtime may choose to either use it or not. For example, it wouldn't be very useful in the CLI version of Playground. ## Testing plan Go to ``` http://localhost:5400/website-server/#{%20%22landingPage%22:%20%22/guides/for-plugin-developers.md%22,%20%22steps%22:%20[%20{%20%22step%22:%20%22writeFiles%22,%20%22writeToPath%22:%20%22/wordpress/guides%22,%20%22filesTree%22:%20{%20%22resource%22:%20%22git:directory%22,%20%22url%22:%20%22https://github.com/WordPress/wordpress-playground.git%22,%20%22ref%22:%20%22trunk%22,%20%22path%22:%20%22packages/docs/site/docs/main/guides%22%20}%20}%20]%20} ``` And confirm the Playground loads a markdown file.
@bgrgicak oh, good point that existing clones won't work without pulling the submodules. You can do that with |
Motivation
Related to #1787
Adds a set of TypeScript functions that support the native git protocol and can power a sparse checkout feature. This is the basis for a faster, more user-friendly git integration. No more guessing repository paths. Just provide the repo URL, browse the files, and tell Playground which directories are plugins, themes, etc.
Technically, this PR performs git sparse checkout using just JavaScript and a generic CORS proxy.
This PR doesn't provide any user-facing feature yet. However, it paves the way to features like:
Notable points of this PR
sparseCheckout()
,lsRefs()
, andlistFiles()
functions from the@wp-playground/storage
package. I'm not yet sure whether we need a dedicated@wp-playground/git
package or not.isomorphic-git
as a git submodules in the/isomorphic-git
path. We can't rely in the published npm package because it doesn't export the internal APIs we need to use here.@wp-playground/components
. They're not used anywhere on the website yet and I'd rather keep them moving with the project than isolate them in a PR until they're perfect. We'll need some accessibility and mobile testing before using them in the webapp, though.How does it even work?
Let me quote my own article:
Running a Git Client in the browser
The good news was isomorphic-git, wasm-git, and a few other projects were already running Git in the browser. The bad news was none of them supported fetching a subset of files via sparse checkout. You’d still have to download 20MB of data even if you only wanted 100KB.
However, Everything the desktop Git client does, including sparse checkouts, can be done via HTTP by requesting URLs like https://github.com/WordPress/wordpress-playground.git.
Git documentation was… less than helpful, but eventually it worked! A few hours later I was running Git commands by sending GET and POST requests to the repository-URLs.
Fetching a hash of the branch
The first command I needed was ls-refs to get the SHA1 hash of the right git branch. Here’s how you can get it with fetch() for the HEAD branch of the WordPress/wordpress-playground repo:
I won’t go into details of the Git protocol – the point is with a few special headers and lines you can be a Git client. If you paste that fetch() in your devtools while on GitHub.com, it would return a response similar to this:
Good! That’s our commit hash.
Fetching a list of objects at a specific commit
With this, we can fetch the list of objects in that branch:
And here’s the response:
The binary data after PACK is a compressed list of all objects the repository had at commit
950f5c8239b6e78e9051ec5e845bac5aa863c4cb
. It is not a list of files that were committed in950f5c
. It’s all files.The pack format is a binary blob. It’s similar to ZIP in that it encodes of a series of objects encoded as a binary header followed by binary data. Here’s an approximate visual to help grok the idea:
The decoding is tedious so I used the decoder provided by isomorphic Git package:
The parsed index object provides information about all the objects encoded in the received packfile. Let’s peek inside:
Each object has a type and some data. The decoder stored some objects in the offsetCache, and kept track of others in form of a hash => offset in packfile mapping.
Let’s read the details of the commit from our parsed index:
It’s the object type, the hash, and the uncompressed object bytes which, in this case, provide us commit details in a specific microformat. From here, we can get the tree hash and look for its details in the same index we’ve already downloaded:
The contents of the tree object is a list of files in the repository. Just like with commit, tree details are encoded in their own microformat. Luckily, isomorphic-git ships relevant decoders:
Yay! That’s the list of files and directories in the repository root with there hashes! From here we can recursively retrieve the ones relevant for our sparse checkout.
Fetching full files from specific paths
We’re finally ready to checkout a few particular paths. Let’s ask for a blob at readme.txt and a tree at docs/tools:
The response is another index, but this time each blob comes with binary contents. Some decoding and recursive processing later, we finally get this:
Yay! It took some effort, but it was worth it!
Cors proxy and other notes
You’ll still need to run a CORS proxy. The fetch() examples above will work if you try them in devtools on github.com, but you won’t be able to just use them on your site. Git API typically does not expose the Access-Control-* headers required by the browser to run these requests.
So we need a server after all. Was this a failure, then? No! A CORS proxy is cheaper, simpler, and safer to maintain than a Git service. Also, it can fetch all the files in 3 fetch() requests instead of two requests per file like the GitHub REST API requires.
Try it yourself
I’ve shared a functional demo that includes a CORS proxy in this repository on GitHub: https://github.com/adamziel/git-sparse-checkout-in-js
Testing instructions
nx dev playground-components
in the first onenx start playground-php-cors-proxy
in the second one to start the PHP Cors proxyCleanShot.2024-09-17.at.21.36.37.mp4