Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Synchronize Ironic provision_state to Nautobot #163

Merged
merged 2 commits into from
Jul 22, 2024

Conversation

skrobul
Copy link
Collaborator

@skrobul skrobul commented Jul 17, 2024

This PR adds a workflow and related sensor that listens to the RabbitMQ messages from Ironic (enabled for dev in https://github.com/RSS-Engineering/undercloud-deploy/pull/94) and every time the node in Ironic is updated, the status is synchronized back to the Nautobot.

While this works as is, I don't feel super good about storing the provision_state in the status field in the Nautobot. I would appreciate some thoughts and feedback on this.

Here is what I was debating:

Storing provision_state in the Device.status

Pros: easy to implement, immediately visible in various places in the UI. Does not require schema changes. The history of changes is automatically tracked.
Cons: bloat in the number of possible statuses (Ironic has 25 of them). Will still require ansible changes to prepopulate Status choices.

Storing provision_state in a custom field

Pros: relatively easy to implement, controlled separately from the Device Status (good thing if that's what we want, for example, to include billing-specific statuses. or we want to avoid having Ironic overriding manually set statuses). No need to pre-populate as the custom field will likely be just a simple string.

Cons: may be more difficult to present in the UI, controlled separately from the Device Status (bad thing if we have to keep them in sync)

If you have any opinions on this, please do share them.

@skrobul skrobul requested a review from a team July 17, 2024 10:49
This commit includes a Workflow and Sensor that reacts to Events
submitted by Ironic through Openstack AMQP EventSource.
Each `ironic.node.update.end` event triggers a workflow to synchronize
node's provision_state to Nautobot.
@skrobul skrobul force-pushed the react-to-ironic-node-update branch from bab33ef to 50ad761 Compare July 17, 2024 10:51
@khackworth
Copy link

I'll add my comments here:

The primary use-case for tracking the ironic provisioning state is really missing here. So I have my own assumption. Perhaps there are more, so feel free to add in more details/thoughts.

  • Ease the ability for a review of state changes and timelines of events (logging) outside of Openstack back to the SoT for whatever review is needed? A device is taking a long time in dhcp, why? A device hangs in a given state, or cycles through other states? I think metrics like this back into Nautobot might make sense, maybe it belongs elsewhere.
  • Eventually, we need some umbrella states that need to be accounted for in Nautobot that shouldn't be as granular as the ironic list. I see a few states that we should care about initially.
    • Active - A device that has been provisioned and is in use by a tenant/project
    • Provisioning - A device that is actively being provisioned
    • Inventory - A device that is idle and pending deployment to some customer solution
    • Planned - A device that is being populated into Nautobot but not yet active in Ironic (new cabinet, new hardware, etc)
    • Quarantine - a state where it should be isolated for whatever reason.
    • Decom - a device that is being turned down, recycled

So, I'll see your two options, veto one, and raise a 3rd one.

Storing in device.status

This option should NOT be done. This state is going to be used by other systems and having this be very dynamic on the states to account for everything that Ironic is seeing really doesn't offer what we need.

Storing provision_state in a custom field

I'm ok with this option. It would be more standardized and dedicated to this data.

Storing provision_state in the "notes" field

I'm also ok with this, just append a note to the device with the new state. Shows a rolling log w/ timestamps of "events".

@skrobul
Copy link
Collaborator Author

skrobul commented Jul 18, 2024

Thanks for feedback @khackworth, I went ahead and updated the code to use a custom field. The reason for choosing this over the notes is that it can be looked up both programmatically and in the UI quite easily and it is timestamped too.

@skrobul skrobul force-pushed the react-to-ironic-node-update branch from 9821b61 to 7b5afe9 Compare July 18, 2024 05:50
@skrobul skrobul force-pushed the react-to-ironic-node-update branch from 7b5afe9 to 99fd595 Compare July 18, 2024 11:15
@cardoe
Copy link
Contributor

cardoe commented Jul 20, 2024

So I don't think we should be storing the Ironic state 1:1. It's more of this machine is available. This machine is actively used. This machine is broken.

@cardoe
Copy link
Contributor

cardoe commented Jul 20, 2024

The review of state changes and metrics don't belong in Nautobot. But they belong in tools for gathering metrics. There's OpenStack actions logs that can be used for details on the state machine changes as well.

@skrobul
Copy link
Collaborator Author

skrobul commented Jul 22, 2024

The review of state changes and metrics don't belong in Nautobot. But they belong in tools for gathering metrics. There's OpenStack actions logs that can be used for details on the state machine changes as well.

Which tools are those? Are they accessible by all relevant parties (i.e. DCOPS)?

So I don't think we should be storing the Ironic state 1:1. It's more of this machine is available. This machine is actively used. This machine is broken.

The "machine is actively used/broken" thing is being tracked by Status field. This is just an additional context for operators.
I have opened https://rackspace.atlassian.net/browse/PUC-408 to discuss if we should map some of the Ironic's statuses transitions to transitions of a Status field.

@cardoe
Copy link
Contributor

cardoe commented Jul 22, 2024

The review of state changes and metrics don't belong in Nautobot. But they belong in tools for gathering metrics. There's OpenStack actions logs that can be used for details on the state machine changes as well.

Which tools are those? Are they accessible by all relevant parties (i.e. DCOPS)?

They would need to be once we get those running. It just seems like the wrong place to show the state machine steps inside of Nautobot.

So I don't think we should be storing the Ironic state 1:1. It's more of this machine is available. This machine is actively used. This machine is broken.

The "machine is actively used/broken" thing is being tracked by Status field. This is just an additional context for operators. I have opened https://rackspace.atlassian.net/browse/PUC-408 to discuss if we should map some of the Ironic's statuses transitions to transitions of a Status field.

Sounds good.

@cardoe cardoe merged commit 14945ff into main Jul 22, 2024
8 checks passed
@cardoe cardoe deleted the react-to-ironic-node-update branch July 22, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants