Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds loop delay monitor and logger #138

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

freider
Copy link
Contributor

@freider freider commented Mar 27, 2024

Adds a simple background task that checks every second if there has been an event loop delay of more than a second.

Defaults to capturing any delays of more than a second
Copy link

@modal-pr-review-automation modal-pr-review-automation bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved 👍. This diff qualified for automatic approval and doesn't need follow up review.

@freider freider requested a review from mwaskom March 27, 2024 13:04
Copy link
Contributor

@mwaskom mwaskom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

t0 = time.monotonic()
await asyncio.sleep(self.loop_delay_monitor_period)
duration = time.monotonic() - t0
delay = duration - self.loop_delay_monitor_period
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could our estimate of the true time the event loop was blocked be off by up to loop_delay_monitor_period? In other words, is the right way to think about it that another task could start blocking immediately after we sleep, or immediately before we try to wake, and we wouldn't be able to tell the difference the way we're calculating / reporting the delay here?

(This is mostly a question for my own understanding — it's probably not critical to report metrics exactly here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, technically the event loop could have been blocked for up to loop_delay_monitor_period while this coroutine is sleeping and we wouldn't detect it unless the blockage overlapped with our expected return time. So worst case we could miss event loop blockage by loop_delay_monitor_period + loop_delay_monitor_threshold seconds (since we don't report unless we pass the threshold either).

But for Modal's purposes the most critical event loop blockage would be anything in the 10+ second magnitude (where heartbeats start to time out), so this should be fine (and we can adjust the period and threshold)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be pedantic we could say “by up to” and give the upper bound but agree a second margin of error isn’t a huge deal.

Comment on lines 111 to 112
self.loop_delay_monitor_period = 1.0 # seconds
self.loop_delay_monitor_threshold = 1.0 # seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be public attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea, so that whatever library that sets up the synchronicity loop can adjust it. But now that I think about it, it's probably better to add them to the synchronizer constructor as optional arguments and keep them private, since changing the period after the event loop thread has started won't have any effect...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, yeah I saw you mutating them in the client so that clarified the intention, but agree that constructor params might make more sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants