Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RHCLOUD-35640] Update custom default group logic #1245

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

lpichler
Copy link
Contributor

@lpichler lpichler commented Oct 17, 2024

This PR adds replication

  • for creation of custom default group logic:
    • add roles to default group
    • remove roles from default group
  • removal of custom groups

Link(s) to Jira

Description of Intent of Change(s)

The what, why and how.

Local Testing

  • add roles to default group:
POST http://localhost:8000/api/rbac/v1/groups/<detault_group_uuid>/roles/
Content-Type: application/json

{"roles":["d414435a-00fc-4dd5-82dd-c5c345ad1338"]}

  • remove roles from default group
DELETE http://localhost:8000/api/rbac/v1/groups/<custom_detault_group_uuid>/roles/?roles=d414435a-00fc-4dd5-82dd-c5c345ad1338
  • removal of custom groups
DELETE http://localhost:8000/api/rbac/v1/groups/<custom_detault_group_uuid>/

Checklist

  • if API spec changes are required, is the spec updated?
  • are there any pre/post merge actions required? if so, document here.
  • are theses changes covered by unit tests?
  • if warranted, are documentation changes accounted for?
  • does this require migration changes?
    • if yes, are they backwards compatible?
  • is there known, direct impact to dependent teams/components?
    • if yes, how will this be handled?

Secure Coding Practices Checklist Link

Secure Coding Practices Checklist

  • Input Validation
  • Output Encoding
  • Authentication and Password Management
  • Session Management
  • Access Control
  • Cryptographic Practices
  • Error Handling and Logging
  • Data Protection
  • Communication Security
  • System Configuration
  • Database Security
  • File Management
  • Memory Management
  • General Coding Practices

@lpichler lpichler changed the title [RHCLOUD-35640] Update custom default group logic [WIP] [RHCLOUD-35640] Update custom default group logic Oct 17, 2024
group_uuid = uuid4()
if settings.PRINCIPAL_CLEANUP_UPDATE_ENABLED_UMB and settings.REPLICATION_TO_RELATION_ENABLED:
tenant_bootstrap_service = V2TenantBootstrapService(OutboxReplicator())
bootstrapped_tenant = tenant_bootstrap_service.bootstrap_tenant(tenant)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alechenninger can we always count that if tenant is created is also also created default and root ws ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If after my CJI job run, I think yes. But now it is not guaranteed a tenant has those workspaces created.
However, it is ok either the tenant has or not. It is guaranteed to coexist with the tenant mapping. If the tenant mapping exist, so does the default/root ws. If not, we create them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, by definition a tenant is not considered "bootstrapped" (in v2) if they don't have built in workspaces. That's part of the process. So bootstrap_tenant must create the builtin workspaces as well as mapping.

    # This assumes that Tenant Mapping exist

Not sure what is meant by the comment exactly – but my interpretation of it is that it is maybe misleading. The method below does not assume the TenantMapping exists. It creates it if it doesn't.

Copy link
Contributor Author

@lpichler lpichler Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called only from API endpoint and in that situation TenantMapping has to be created already.
But maybe in some edge cases it is not true ?

I can remove the comment. My main concern was whether the tenant can be created, but the default workspace is not, or if the tenant mapping could be created, but the default workspace is not. Because if the tenant mapping is not created and the default workspace is already created, this method will fail - but it looks that it is not possible.

Copy link
Collaborator

@alechenninger alechenninger Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called only from API endpoint and in that situation TenantMapping has to be created already. But maybe in some edge cases it is not true ?

Okay, yeah I see the issue. We need to update the settings check above to check for V2_BOOTSTRAP_TENANT I think. I think this would only happen during migration steps as of latest doc. There is a window where a default group may attempt to be created for an existing tenant that is not yet bootstrapped.

We could keep this as is, and then I think it should never happen to your point (at least, after replication is enabled). The user import job will take care of race conditions in that case (it will detect default groups added before bootstrapping). But, I think it is a little safer to bootstrap the tenant here in that case, to ensure we use the uuid from the tenant mapping. And we know we cannot get into trouble there, because of the 1-1 unique constraint on the tenant mapping, any concurrent tx creating one (e.g. the user import) will fail.

My main concern was whether the tenant can be created, but the default workspace is not, or if the tenant mapping could be created, but the default workspace is not. Because if the tenant mapping is not created and the default workspace is already created, this method will fail - but it looks that it is not possible.

Yes, correct – we should never have workspaces without the TenantMapping. That's why the bootstrap service stuff was created – to consolidate the logic to enforce those kinds of things.

@lpichler lpichler force-pushed the update_custom_default_group_logic branch from c9e05da to 3dad051 Compare October 17, 2024 12:39
group_uuid = uuid4()
if settings.PRINCIPAL_CLEANUP_UPDATE_ENABLED_UMB and settings.REPLICATION_TO_RELATION_ENABLED:
tenant_bootstrap_service = V2TenantBootstrapService(OutboxReplicator())
bootstrapped_tenant = tenant_bootstrap_service.bootstrap_tenant(tenant)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If after my CJI job run, I think yes. But now it is not guaranteed a tenant has those workspaces created.
However, it is ok either the tenant has or not. It is guaranteed to coexist with the tenant mapping. If the tenant mapping exist, so does the default/root ws. If not, we create them

rbac/management/group/definer.py Outdated Show resolved Hide resolved
rbac/management/relation_replicator/relation_replicator.py Outdated Show resolved Hide resolved
rbac/management/group/definer.py Outdated Show resolved Hide resolved
@lpichler
Copy link
Contributor Author

/retest

@lpichler lpichler force-pushed the update_custom_default_group_logic branch 6 times, most recently from 89aa42a to 66742b0 Compare October 22, 2024 09:31
@lpichler lpichler changed the title [WIP] [RHCLOUD-35640] Update custom default group logic [RHCLOUD-35640] Update custom default group logic Oct 22, 2024
@lpichler lpichler force-pushed the update_custom_default_group_logic branch 2 times, most recently from 3727059 to 192d992 Compare October 22, 2024 12:05
@lpichler lpichler force-pushed the update_custom_default_group_logic branch from 192d992 to 10f693c Compare October 22, 2024 12:22
@lpichler
Copy link
Contributor Author

/retest



@transaction.atomic
def add_roles(group, roles_or_role_ids, tenant, user=None):
def add_roles(group, roles_or_role_ids, tenant, user=None, relations=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing I am wondering about is if we can avoid passing relations here all the way through.

It is a bit awkward right now. At the very least I would rename the parameter. But maybe we can have this return relations, and then have an outer method consolidate them into a single replication event. But, will take a look more tomorrow.

Sorry for the short comment – need to run now and just wanted to jot this down for you to think on :-). Catch up more tomorrow :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing I am wondering about is if we can avoid passing relations here all the way through.

Yes definitely I agree! I wanted to know whether changes are ok regard to algorithm and relation generation. Anyway I will rework to avoid passing relations.

Copy link
Collaborator

@alechenninger alechenninger Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as I was looking at this I think I have a suggestion for this, which I think would also make the logic easier to understand/validate.

First, it involves using two outbox events. One event will separate the default access, and replicate that default access with new role bindings. The other will update those role bindings to add or remove roles.

I do not think it is worth the complexity to combine those outbox events. Using two separate events does two things: (1) the additional access or removed access will be "more" eventually consistent with RBAC, by one additional event, and (2) it adds potentially redundant work. As for (1), it is already eventually consistent. Because both events are committed in one transaction, they will still be correctly ordered with respect to concurrent changes (e.g. adding/removing users from the group which is itself eventually consistent for default groups anyway). So, it actually doesn't change the semantics in any way to combine those events. To any perspective, the result is the same: access is the default, and then it changes at some point in the future, regardless of whether that happens with one replication event or two.

As for (2) (performance)... I'm just thinking it's premature optimization to think about that much otherwise. The worst case scenario is that you remove all access that was just added. So you'd start by adding all role bindings, and then you'd remove them. Realistically, if you had default access for 30 roles, that means adding ~90 tuples and then removing them. In the grand scheme of things, probably not a big deal. And if it's a problem, we can always improve it later.

So given that, here is what I would suggest:

  1. Change the RelationApiDualWriteGroupHandler but not quite as much as in this PR currently. Maybe just keep the separate replicate piece or let it take multiple roles at once. So change replicate_added_role to accept multiple roles, and then we just add them all together in one method call. Or separate add_role from replicate_added_role and then keep the separate replicate() method. (but we don't need the custom group param or other stuff)
  2. In clone_default_group_in_public_schema, replicate an event that (a) removes default access and (b) re-adds the roles. I think this should be able to reuse the old RelationApiDualWriteGroupHandler code and call replicate_added_roles with all the roles (or whatever based on above), similar to what add_roles used to do. I would probably add an argument or method to the group handler for removing default access, and then remove the default_bindings_from_mapping method from the bootstrap service. The logic there doesn't really have to be kept in the bootstrap service for a few reasons:
  3. The group handler already knows that default access goes to the default workspace and already looks up the default workspace.
  4. It already has the group uuid and mapping for the default role binding uuid
  5. The only missing piece then is the platform default role uuid. We could consider just not removing this part since strictly speaking all we have to remove is the binding relationship. Regardless, the platform default role uuid is already repeated by the seeding process, so its something that should really be exposed outside of the bootstrap service in a reusable place, anyway.
  6. Then, adjust add_roles and remove_roles only to use the modified methods per 1 above. (It would be an unrelated improvement to have a single outbox event for those–I hadn't realized or forgotten that was doing multiple I think.)

One easy way to improve the performance, too, would be to mint a ReplicationEvent and optionally allow passing that into the add/remove roles methods. Then, if present, "merge" it with the one generated by that method. This would decouple it from the logic and keep the "redundant" work purely in memory confined to a single CPU.

So in summary that would look something like this (psuedo-ish code, adjust as needed per any different API choices you make):

def clone_default_group_in_public_schema(group, tenant) -> Optional[Group]:
    """Clone the default group for a tenant into the public schema."""
    bootstrapped_tenant = None
    if settings.V2_TENANT_BOOTSTRAP:
        tenant_bootstrap_service = V2TenantBootstrapService(OutboxReplicator())
        bootstrapped_tenant = tenant_bootstrap_service.bootstrap_tenant(tenant)
        group_uuid = bootstrapped_tenant.mapping.default_group_uuid
    else:
        group_uuid = uuid4()

    public_tenant = Tenant.objects.get(tenant_name="public")
    # ... all the same group and policy set up stuff
    dual_write_handler = RelationApiDualWriteGroupHandler(group, ReplicationEventType.CUSTOMIZE_DEFAULT_GROUP)
    dual_write_handler.replicate_added_roles(public_default_roles, remove_default_access_from=bootstrapped_tenant.mapping)
    return group

And then inside replicate_added_roles, if the mapping argument was provided we would either just remove a tuple from this group to the default role binding, or additionally remove the default role and default group subject as well. But if that's a pain or the logic is messy it's really optional–it's just an optimization to avoid extra tuples lying around. But they won't hurt I think.

And then add/remove_roles are mostly untouched I think (aside from any updates to dual write handler interface).

WDYT?

@lpichler lpichler force-pushed the update_custom_default_group_logic branch from 10f693c to ed35176 Compare October 24, 2024 12:30
Comment on lines +138 to +139
roles = Role.objects.filter(policies__group=custom_group)
system_roles = roles.filter(tenant=Tenant.objects.get(tenant_name="public"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious – do you need the additional filter on tenant?

Comment on lines +282 to +285
bootstrap_service = V2TenantBootstrapService(replicator=NoopReplicator())
bootstrapped_tenant = bootstrap_service.bootstrap_tenant(self.group.tenant)
relations_to_add = bootstrap_service.default_bindings_from_mapping(bootstrapped_tenant)
self.group_relations_to_add.extend(relations_to_add)
Copy link
Collaborator

@alechenninger alechenninger Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is technically correct / working, but I wonder if this should really be used via the bootstrap service. I guess I kind of touched on this in the other comment, but while that service does rely on it, it's a little awkward to make that the main source, for something relatively simple. I'm curious what it would look like if we queried for the mapping for a tenant instead here and created the relationships to remove inline.

We can again consider only adding/removing the binding tuple, which would simplify this, if needed. Removing the other tuples is not required, it's only a database storage optimization, which is probably a minor concern. So it might be more worth it to leave them and simplify the code and maintenance of tuples.

@@ -353,6 +353,7 @@
V2_MIGRATION_RESOURCE_EXCLUDE_LIST = ENVIRONMENT.get_value("V2_MIGRATION_RESOURCE_EXCLUDE_LIST", default="").split(",")
V2_BOOTSTRAP_TENANT = ENVIRONMENT.bool("V2_BOOTSTRAP_TENANT", default=False)
V1_BOOTSTRAP_ADD_USER_ID = ENVIRONMENT.bool("V1_BOOTSTRAP_ADD_USER_ID", default=False)
LOG_REPLICATION = ENVIRONMENT.bool("LOG_REPLICATION", default=False)
Copy link
Collaborator

@alechenninger alechenninger Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider an "aggregate replicator" which passed the event to multiple implementations, and then optionally using this with the logging replicator plus another impl, as an elegant way to add logging to any replicator, but I'm not sure if it's worth it. Or if it is, maybe this can be its own small PR.

Another option would be to add normal logging statements to the OutboxReplicator and configure that with normal logging configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants