Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(publish): Allow to remove components from a published repository #1350

Conversation

cfiehe
Copy link

@cfiehe cfiehe commented Sep 27, 2024

Fixes #1349

Description of the Change

This commit introduces the new remove command for a published repository. It allows to remove one or multiple components from a published repository without needing to recreate it.

Checklist

  • functional test added/updated (if change is functional)
  • man page updated (if applicable)
  • bash completion updated (if applicable)
  • documentation updated
  • author name in AUTHORS
  • swagger documentation

@cfiehe
Copy link
Author

cfiehe commented Sep 27, 2024

The change is incomplete. Documentation, man page and bash completion must be added. The current implementation should be a basis for a discussion. Maybe someone has a better idea to achieve the described result without the need to introduce a new subcommand for the publish group. But if the extension is ok from a functional perspective, I will try to add the missing changes.

@cfiehe cfiehe force-pushed the feature/allow_remove_component_from_published_repo branch 3 times, most recently from 76cbb98 to 1df616c Compare September 27, 2024 21:02
api/router.go Show resolved Hide resolved
@neolynx
Copy link
Member

neolynx commented Sep 30, 2024

I see the need for removing a component. But I wonder if it needs to be a new command/api.

Could this also be part of the api.PUT("/publish/:prefix/:distribution", apiPublishUpdateSwitch) ? it would "modify" a publish, i.e. adding and also removing a component, if it is not specified ? Would this break things ?

@cfiehe
Copy link
Author

cfiehe commented Sep 30, 2024

From

I see the need for removing a component. But I wonder if it needs to be a new command/api.

Could this also be part of the api.PUT("/publish/:prefix/:distribution", apiPublishUpdateSwitch) ? it would "modify" a publish, i.e. adding and also removing a component, if it is not specified ? Would this break things ?

When we use the PUT approach, we must do an extra remote GET call for retrieving all components that are part of the published repository. Because there is currently no way to get the resource representation of a single published repository, we must retrieve the whole list and find the published repository, or more precisely the components of the published repository. In addition, there is currently no Components attribute in the JSON representation of a published repository. So we can only do a PUT on the published repository by specifying the Snapshots attribute and all of the sources which should be part of the published repository. In that case, we can only remove components of published repositories, which are based on snapshots and not of local repositories. I do not know, why no Sources attribute has been used here, which would also work for local repositories and snapshots.

What do you think? Is the extra GET call ok for you?

In that case, we should add - in compliance with the cli - something like

api.GET("/publish/:prefix/:distribution", apiPublishShow)

which allows to get the resource representation of a single published repository and modify the apiPublishUpdateSwitch method accordingly.

Should we live with the limitation that components can only removed from published repositories, which are based on snapshots or should we add a new Sources attribute that would also be usable in case of local repositories?

Maybe, it is possible to ensure backward compatibility in case of snapshots, so that both the deprecated attribute Snapshots and the new attribute Sources can be used. When both are specified, it is an error.

@neolynx
Copy link
Member

neolynx commented Sep 30, 2024

I would be great if we could come up with more solid API concept here ;-) just brainstorming..

maybe we can add a DELETE /publish/:prefix/component, send component to delete and distribution as json and avoid conflicting with "/publish/:prefix/:distribution", apiPublishDrop somehow ?

I don't think it's great to have an extra GET for the components to specify, except the one to be deleted...

Another question: after deleting a component from a publish, would republishing the same snapshot not re-add said component ?

@cfiehe
Copy link
Author

cfiehe commented Oct 1, 2024

The main problem with the DELETE is that it is a grey area, because it is unclear if a DELETE request can/should have a body:
https://stackoverflow.com/questions/299628/is-an-entity-body-allowed-for-an-http-delete-request

I try to avoid is whenever possible in order to minimize compatibility issues.

I like resource identifiers coming from the domain. The only problem, we currently have with /publish/:prefix/:distribution is that prefix and distribution may contain slashes. The W3C recommendation is that slashes must imply a hierarchical structure. That is not the case if prefix and distribution contain slashes. A feasible approach is to encode these slashes using %2F: https://www.w3.org/Addressing/URL/4_URI_Recommentations.html

By making a component a first class resource entity we still have the problem, that the DELETE may have optional parameters like MultiDist, SkipCleanup or Signing which can/must be put into a query (camelcase or snakcase?) when we do not want a body:

DELETE /api/publish/:prefix/:distribution/:component/?multi_dist=1&skip_contents=1

In case of signing, we are getting into trouble, because everything must be put into the query. That does not scale well and makes enhancements more difficult.

Another option would be to use PATCH for a partial resource update. But when we closely stick to the PATCH definition, we must use Json pointers in order to remove something from the Sources list. This is only possible when we use the index of the element in the list. From my point of view, this identifier is too unstable and the index is unknown to the client. This approach would also require a GET request to fetch all sources and find the corresponding index. When Sources would be a map instead of a list with the component names as keys, the PATCH approach would work, because the key is known to the client:

PATCH /api/publish/:prefix/:distribution
{
    "op":"remove",
    "path":"/Sources/<component>"
}

@neolynx neolynx self-assigned this Oct 1, 2024
@cfiehe
Copy link
Author

cfiehe commented Oct 1, 2024

What do you think, if we extend the struct PublishedRepo by the additional settings MultiDist and Signing passed by the initial POST request for creating a published repository?

I think we should obey the following rules in the REST API - just a proposal:

  • GET and DELETE messages MUST not have a body.
  • PATCH commands MUST only refer to properties of the JSON resource representation.
  • PUT commands MUST contain the whole resource representation and are never used for partial updates of a resource.

I am not sure, if this API for a published repository will work in every case:

POST      /api/publish
GET       /api/publish
GET       /api/publish/:prefix/distributions/:distribution
DELETE    /api/publish/:prefix/distributions/:distribution
GET       /api/publish/:prefix/distributions/:distribution/signing
GET       /api/publish/:prefix/distributions/:distribution/sources
PATCH/PUT /api/publish/:prefix/distributions/:distribution/signing
PATCH/PUT /api/publish/:prefix/distributions/:distribution/sources

Optional:

PATCH/PUT /api/publish/:prefix/distributions/:distribution/sources/:component
DELETE    /api/publish/:prefix/distributions/:distribution/sources/:component

The last three commands cover the following cases:

  • Adds/Removes/Updates multiple components in an atomic manner, so that costly operations like contents file creation are executed only once. We must fetch the Sources list at first, otherwise there is no way to make an index-based list modification if we stick to a list for modelling Sources.
  • Updates a single component with a new source.
  • Deletes a single component

When operations may have optional flags like -force-overwrite and -skip-cleanup, we must pass them as query parameters.

Optional:
ETags seem to be an interesting approach to detect, whether a resource was modified concurrently. Concurrent modifications can be a problem, e.g. when you remove something from a list by its index. This approach only works when the resource was not modified in the meantime otherwise you may drop the wrong component via JSON Patch. In that case the If-Match: <etag_value> would be helpful to make sure that you operate on a known resource state.

@cfiehe
Copy link
Author

cfiehe commented Oct 1, 2024

The question is, is there also a minimal-invasive way to extend the current API with full backward compatibility. Maybe this would be a compromise:

PATCH/PUT /api/publish/:prefix/:distribution/sources

I am not sure, if we can risk to use another resource representation for the sources sub resource and use a map instead of a list.

What do you think?

That is the only way to get rid of the extra fetch request when modifying the sources resource via PATCH. Nevertheless, we must add MultiDist and Signing to the struct PublishedRepo and make it possible to update them via PUT on /api/publish/:prefix/:distribution. We cannot add them to the PATCH request body, when modifying a source and passing them via query does not feel right.

@neolynx
Copy link
Member

neolynx commented Oct 1, 2024

Thanks a log for the analysis !

Concerning the potential slashes in the distribution field, there is already a convention in aptly to replace them with underscores, see https://github.com/aptly-dev/aptly/blob/master/api/publish.go#L47. It is used for prefix only I think, maybe it should be used for distribution fields too. This allows us to expand the api and be backward compatible..

I am not a fan of PATCH in general, as they require "operations"... but it does have a body, we could pass signing and other info ?

I agree we should not add a body to DELETE, and I think i see the problem: to publish we need to pass signing information, which would require a body. So DELETE cannot be used to modify and resigns an existing publish ?

Did I understand that right, the problem you wanna solve is to not republish big snapshots (or in worst case recreate them) but rather just change the components and/or sources ? Could you show a use case for better understanding ?

@cfiehe
Copy link
Author

cfiehe commented Oct 1, 2024

Thanks a log for the analysis !

Concerning the potential slashes in the distribution field, there is already a convention in aptly to replace them with underscores, see https://github.com/aptly-dev/aptly/blob/master/api/publish.go#L47. It is used for prefix only I think, maybe it should be used for distribution fields too. This allows us to expand the api and be backward compatible..

Yes, I think that is a feasible solution 😃.

I am not a fan of PATCH in general, as they require "operations"... but it does have a body, we could pass signing and other info ?

Well, the problem with PATCH is that we should follow the JSON patch specification and that limits our freedom in what we put into the message body: https://datatracker.ietf.org/doc/html/rfc6902. Everything must be modeled in an operational style and we can only transform attributes of the JSON representation of the resource. Currently, the Signing attribute is not part of the JSON representation of a published repository and we can only refer to the following attributes:

"Architectures"
"Distribution"
"Label"
"Origin"
"Suite"
"Codename"
"NotAutomatic"
"ButAutomaticUpgrades"
"Prefix"
"Path"
"SourceKind"
"Sources"
"Storage"
"SkipContents"
"AcquireByHash"

We can only use the operations add, remove, replace, move, copy, and test on the JSON representation of a published repository. By following the standard, no signing information can be passed using the PATCH approach. As an example, imagine that we want to replace the label value of a published repository. We could do that via PATCH using:

PATCH /api/publish/:prefix/:distribution

with the message body:

{ "op": "replace", "path": "/Label", "value": "ubuntu" }

If we want to delete a published component, we must use the index of the right component tuple (Component, Name) in the Sources list, for example:

{ "op": "remove", "path": "/Sources/2 }

The problem is that the index is not known in advance and that we must ensure a stable order of the Sources attribute, otherwise the wrong component may get removed. We can only use patch when the signing information is already present, so that it is just used, when the patch operation is being executed. That was the idea of storing the information into the database.

I agree we should not add a body to DELETE, and I think i see the problem: to publish we need to pass signing information, which would require a body. So DELETE cannot be used to modify and resigns an existing publish ?

Exactly, otherwise we have to add a message body to the DELETE request, which contains the signing information. The next problem with DELETE is that we can normally only delete a single resource/component, but the remove cli allows the deletion of multiple components in a single call, so that expensive operations are executed only once.

Did I understand that right, the problem you wanna solve is to not republish big snapshots (or in worst case recreate them) but rather just change the components and/or sources ? Could you show a use case for better understanding ?

Yes, that’s right. I have "inherited" 😉 a pretty old self-built APT repo. The problem is that it only contains a single distribution, which contains nearly 100 components. Each component groups the packages that belong to a specific device version, so we have e.g. a component called 1.0 or 2.2.1. From time to time, a new version in the form of a component with updated packages has to be added. When a version e.g. gets deprecated, it should be possible to drop the corresponding component from the repository without the need to redefine the whole repository with all of its components. It is a hard constraint to stick to the predefined repository structure. Aptly seems to be a good replacement, when we find a way to add and remove components of a published repository 😃.

@cfiehe
Copy link
Author

cfiehe commented Oct 1, 2024

I found that interesting: https://stackoverflow.com/a/21863914 (3. point)

That is the reason, why I have decided to model a dedicated remove resource that gets created via POST allowing to drop multiple components at once without the need

  • to use an additional fetch call to get all components,
  • to introduce PATCH,
  • to add a message body to DELETE or
  • to pass a pretty long component list as query parameter.

@cfiehe
Copy link
Author

cfiehe commented Oct 1, 2024

Using PATCH would also mean to add a new go library to Aptly that does the resource transformation and in that case, we should use PATCH on other resources as well. However, that would nearly mean a complete redefinition of Aptly’s REST API. I think that is too much work for too little benefit and we will quickly loose backward compatibility.

@cfiehe
Copy link
Author

cfiehe commented Oct 1, 2024

From my point of view, we should try to be pragmatic. It is not a one way at all. We can discuss,

  • if we can both live with the new remove resource
  • if we add the signing and multi dist information to the database
  • if we declare the Snapshots attribute on PUT /api/publish/:prefix/:distribution as deprecated and replace it by a new one called Sources that can also be used to change components of a local repository via the REST API

Maybe this approach is not perfect, but we do not violate any RESTful principles and can provide an efficient REST API.

@neolynx neolynx force-pushed the feature/allow_remove_component_from_published_repo branch from 1df616c to 65d7367 Compare October 2, 2024 14:30
@neolynx
Copy link
Member

neolynx commented Oct 2, 2024

Agree, let's be pragmatic. I just wanted to fully understand :)

But I still have some questions...

  • if we can both live with the new remove resource

this would remove a snapshot from a publish ? or a component ?
all fields before the remove should then be escaped (with parseEscapedPath) to change _ into /

  • if we add the signing and multi dist information to the database

signing information is secret, we cannot add this to the database, if you want to publish/republish, signing info needs to be provided. the multidist could be remembered in the database, so it does not need to be specified.

  • if we declare the Snapshots attribute on PUT /api/publish/:prefix/:distribution as deprecated and replace it by a new one called Sources that can also be used to change components of a local repository via the REST API

Why this change? What is a Source, compared to Snapshot ?
I think for backward compatibility, the Snapshots argument needs to stay, but will be ignored if other fields are present, i.e. Sources.

I rebased your branch on master, where we have swagger documentation now, so we can document the APIs inline.

@cfiehe
Copy link
Author

cfiehe commented Oct 2, 2024

No problem, ask everything that is currently unclear.

this would remove a snapshot from a publish ? or a component ? all fields before the remove should then be escaped (with parseEscapedPath) to change _ into /

The remove operation (API and CLI) drops a component from a published repository. It works on published repositories, which are based on snapshots, but also on those, which are based on local repositories. You are right; we must definitely unescape the prefix and the distribution (with parseEscapedPath) at the very beginning of the each API method.

signing information is secret, we cannot add this to the database, if you want to publish/republish, signing info needs to be provided. the multidist could be remembered in the database, so it does not need to be specified.

Ok, when we are “not forced” to use PATCH, we do not have to store the signing information in the database. The user can pass them when calling a command that requires signing information. Having the MultiDist information in the database would be nice. It is not mandatory, but it would make (re-)publishing more comfortable.

Why this change? What is a Source, compared to Snapshot ?
I think for backward compatibility, the Snapshots argument needs to stay, but will be ignored if other fields are present, i.e. Sources.

I am working on an additional pull request that will allow adding components to a published repository that is based on local repos. I kept the Snapshots attribute for reasons of backward compatibility and added the new Sources attribute that can be used for published snapshots and for published local repos as well. In case of published snapshots, the Sources attribute gets considered first and - if unset - the Snapshots attribute is checked. Actually, there is no difference between Sources and Snapshots. But for switching components of published local repos, we need a more general name and Sources is used for that throughout the whole Publish API.

api/publish.go Outdated Show resolved Hide resolved
"github.com/smira/flag"
)

func aptlyPublishRemove(cmd *commander.Command, args []string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update man page?

@neolynx neolynx added the needs review Ready for review & merge label Oct 3, 2024
@neolynx neolynx assigned cfiehe and unassigned neolynx Oct 3, 2024
This commit introduces the new 'remove' command for a published repository. It allows to remove one or multiple components from a published repository without needing to recreate it.

Signed-off-by: Christoph Fiehe <[email protected]>
@neolynx neolynx force-pushed the feature/allow_remove_component_from_published_repo branch from 65d7367 to e8adf4f Compare October 3, 2024 21:25
@neolynx
Copy link
Member

neolynx commented Oct 3, 2024

I was able to rebase on master and force push :-)

the other branch I could not push, no permissions

@neolynx
Copy link
Member

neolynx commented Oct 4, 2024

I pushed a commit with swagger documentation. Maybe you can improve the description and also update the others :-)

@neolynx neolynx force-pushed the feature/allow_remove_component_from_published_repo branch from b667374 to a89d7db Compare October 4, 2024 18:50
@neolynx neolynx added the WIP label Oct 5, 2024
@neolynx neolynx force-pushed the feature/allow_remove_component_from_published_repo branch from a89d7db to c0b096e Compare October 7, 2024 11:11
@neolynx
Copy link
Member

neolynx commented Oct 7, 2024

fixed a typo in the doc and pushed

@neolynx neolynx removed the needs review Ready for review & merge label Oct 8, 2024
@neolynx
Copy link
Member

neolynx commented Oct 9, 2024

shall we close this in favor of #1366 ?

@neolynx
Copy link
Member

neolynx commented Oct 14, 2024

closing in favor of #1366

@neolynx neolynx closed this Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to remove a component from a published repository
2 participants