Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a Sleet "virtual" registry #187

Open
JustinGrote opened this issue Jan 30, 2024 · 4 comments
Open

Creating a Sleet "virtual" registry #187

JustinGrote opened this issue Jan 30, 2024 · 4 comments

Comments

@JustinGrote
Copy link

I'm working on a way to speed up the PowerShell gallery, as it currently takes 500ms per individual request as it is basically one giant SQL server.

Initially I built a nuget v2 -> v3 caching bridge (https://github.com/JustinGrote/pwshgallery) but it's becoming more and more clear to me that having a static feed would help immensely in terms of both hosting and caching scale.

What I would like to do is this:

  1. Get all the v2 metadata for 300k packages on the gallery.
  2. Build "virtual" packages that are just the metadata with a content link to the azure CDN for the actual nuget package
  3. Generate the Registrations/PackageIndexes/etc.

What I don't want to do is download all 300K packages to do the push, which seems to be what is currently required, I just want the manifest data but have it point to the existing Azure CDN links for the packages. (that part is plenty fast)

Based on what I read in the code (which I find really well structured, nicely done!), it seems what I effectively would want to do is create a new ISleetFileSystem for both the source and destination.

The source:

  • Would read from my v2 Packages data and produce the package manifest information rather than reading directly from nupkg files.

The destination:

  • Would publish for files but skip the nupkg publish step.

And then do incremental updates from there.

Do I have that about right? Is it feasible or is there a better approach?

End Goal
I can run sleet (either programattic or CLI) that takes a big ol' nugetv2 Packages Feed output, converts the output to nuget v3 manifest format, creates the manifests, and then publishes those either locally or to an azure storage account but doesnt publish the packages, instead all the content URLs point to Azure CDN.

@emgarten
Copy link
Owner

emgarten commented Jan 30, 2024

In theory this should be possible with some extra code that calls the NuGet v2 library to read the v2 feed and a special ISleetFileSystem or wherever you need to change the nupkg url at.

The difficulty in changing the nupkg url is that for PackageBaseAddress in the v3 feed the client constructs the url and doesn't get it from the registration file. (There are two ways to get the nupkg url). https://emgarten.com/posts/understanding-nuget-v3-feeds

You could solve this if you did a 301 redirect but then it isn't only static files.

I would first create a test feed with sleet, manually edit it to change the url and remove the nupkg, and see if it works.

If you were to build a way to read a v2 it would look something like this:

Use the NuGet v2 library to access the ps gallery through the IPackageRepository interface (easy to do) and iterate through all packages.

https://github.com/NuGet/NuGet2/blob/875e7a4576d46785a8f2990b466e5add0229a726/src/Core/Repositories/IPackageRepository.cs#L27

It will return metadata that looks similar to the nuspec: https://github.com/NuGet/NuGet2/blob/2.14/src/Core/Packages/IPackageMetadata.cs#L6

Create a nuspec from that data (might even be possible through the v2 library), then give it to sleet as a PackageInput that has a nupkg with only a nuspec in it.

Alternatively, you might want to try using the v2 library to download all packages, and push the entire nupkg to the new feed. I'm not sure how large ps packages are, but it might not be as bad as you think.

@JustinGrote
Copy link
Author

Create a nuspec from that data (might even be possible through the v2 library), then give it to sleet as a PackageInput that has a nupkg with only a nuspec in it.

Yes, this was basically my plan.

Alternatively, you might want to try using the v2 library to download all packages, and push the entire nupkg to the new feed. I'm not sure how large ps packages are, but it might not be as bad as you think.
I do have the compute power to do so, I was hoping to streamline the incrementals, but maybe it's just best to load this into a giant Azure VM and grab them in parallel. Once the initial heavy lifting is done, the incrementals can be a lot more straightforward.

@emgarten
Copy link
Owner

incrementals can be a lot more straightforward

You should be able to order on publish time to find only the most recent changes.

From what I recall from doing this in the past, some fields can be edited on the gallery and that will make it look like it is a new package when only the title changed, watch out for that.

@JustinGrote
Copy link
Author

@emgarten thanks, as I recall I can track by CreatedDate and not ModifiedDate, as packages are meant to be immutable on the gallery but that's a good test to check for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants