Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor daemon loop #36

Open
2 of 7 tasks
AndyRae opened this issue Jan 22, 2025 · 2 comments
Open
2 of 7 tasks

Refactor daemon loop #36

AndyRae opened this issue Jan 22, 2025 · 2 comments

Comments

@AndyRae
Copy link
Member

AndyRae commented Jan 22, 2025

Is this the right issue type?

  • Yes, I'm planning work for this project team.

Summary

The main daemon while loop could do with moving out to testable functions and cleaning up.

Acceptance Criteria

  • Code is clean, modular, and testable
  • Tests are written for the logic
  • Graceful degradation for retries

Tasks

  • Extract code to functions
  • Write unit tests

Confirm creation

  • This issue is ready
@elementechemlyn
Copy link
Contributor

Should the main loop catch network exceptions? A temporary network problem will cause the loop to end. This possibly isn't a problem if/when the container is set to restart on failure but should the code be as robust as we can make it without relying on that?

@AndyRae
Copy link
Member Author

AndyRae commented Jan 30, 2025

Should the main loop catch network exceptions? A temporary network problem will cause the loop to end. This possibly isn't a problem if/when the container is set to restart on failure but should the code be as robust as we can make it without relying on that?

Yes is my view. We currently rely on container restarts to manage the service, it feels completely unnecessary and fragile, and just adds complexity to observability / debugging.

The best outcome is graceful degradation - catch the exception, log it, and wait twice as long each time to retry (with a maximum time of 1 minute (which can be over-ride in config..)).
If a request succeeds - then the retry is reset.

I had something like this in my mind

def fetch_task(client: TaskApiClient, polling_endpoint: str) -> response:
    attempt = 0
    while True:
        try:
            response = client.get(endpoint=polling_endpoint)
            response.raise_for_status()
            return response
        except requests.RequestException as e:
            wait_time = min(2 ** attempt, MAX_RETRY_BACKOFF)
            logger.warning("Network error fetching task: %s. Retrying in %s ...", e, wait_time)
            time.sleep(wait_time)
            attempt += 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants