Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: the metav1.MicroTime was not being set #65

Merged
merged 1 commit into from
Feb 19, 2024

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented Feb 19, 2024

Problem: I noticed in testing that the time only had granularity down to the second.

Solution: It appears that when we do a create of the PodGroup from the reconciler watch, the metadata (beyond name and namespace) does not stick. I am not sure why, but the labels are still retrievable from the pods (via the mutating webhook) after. So instead, we need to get the size and creation timestamp at the first hit in reconcile, which (given how that works) should still somewhat honor the order. I did try adding the timestamp to a label but it got hairy really quickly (kept me up about 3 hours longer than I intended to!) The good news now is that I see the microseconds in the Schedule Start Time, so we should be almost ready to test this on a GCP cluster. I also had lots of time waiting for the containers to rebuild so I made a diagram of how it is currently working. I have some concerns about the internal state of fluxion (my kind cluster stopped working after some hours and I do not know why) but we can address them later. We mostly need to see if there are jobs that are being forgotten, etc.

@vsoch vsoch marked this pull request as ready for review February 19, 2024 10:43
Problem: I noticed in testing that the time only had granularity down to the second.
Solution: It appears that when we do a create of the PodGroup from the reconciler watch,
the metadata (beyond name and namespace) does not stick. I am not sure why, but the labels
are still retrievable from the pods (via the mutating webhook) after. So instead, we need
to get the size and creation timestamp at the first hit in reconcile, which (given how that
works) should still somewhat honor the order. I did try adding the timestamp to a label
but it got hairy really quickly (kept me up about 3 hours longer than I intended to!) The
good news now is that I see the microseconds in the Schedule Start Time, so we should be
almost ready to test this on a GCP cluster. I also had lots of time waiting for the containers
to rebuild so I made a diagram of how it is currently working. I have some concerns about
the internal state of fluxion (my kind cluster stopped working after some hours and I do not
know why) but we can address them later. We mostly need to see if there are jobs that are being
forgotten, etc.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@vsoch vsoch force-pushed the tweak-timestamps-add-docs branch from 7b0f356 to 2c5ec2b Compare February 19, 2024 14:05
@vsoch vsoch merged commit 8205335 into fluence-controller Feb 19, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant