DEVOPS-7803 restore log stream #2972

DCkQ6 · 2024-07-03T13:20:08Z

Change Overview

Unfortunately, there is a hardcoded timeout in the kubelet server. If we want to support long-lasting phases, we need to work around it.

This PR introduces re-establishing the log stream connection while the pod is still running.

Pull request type

Please check the type of change your PR introduces:

Issues

fixes [BUG] Problem with long running phase #1622

Test Plan

I created a phase executing function KubeTask, with a long-lasting command (sleep). I ensured that the logs were read until the end of its execution and that the output was properly gathered.

💪 Manual
⚡ Unit test
💚 E2E

Re-establish connection while the pod is still running

hairyhum

This approach is better, but can we have some automated test to prove that we don't lose logs?

hairyhum · 2024-07-25T19:10:23Z

pkg/kube/pod_controller.go

+
+func (s *restoreLogStreamReader) Read(p []byte) (n int, err error) {
+	n, err = s.reader.Read(p)
+	defer func() { s.lastReadTime = metav1.Now() }()


This looks like we take the timestamp after the stream close. My main concern in #2903 (comment) was that if there was a log at the time of the stream close we might lose some records.
Did you have a way to reproduce the failure so we can have at least empirical proof that this approach works?

Yes, there is a way to reproduce it, and I did it. Unfortunately, each test takes 4 hours as I just run an action from a blueprint that uses a long sleep and echo.

This approach in manual tests always resulted in a duplicated last line of the log, which was independent of the frequency of the log. The last line was repeated even if it was the only line since 4 hours.

Right. Should we then add some deduplication for that case (if we can)?

Deduplication would be quite an overhead for what we are trying to achieve. This problem manifests only after 4 hours, and a single duplicated line for tasks that take over 4 hours, in my opinion, doesn't justify the need to add deduplication logic. This logic would require storing the last read line in memory and comparing it after reconnecting. Additionally, we should not assume that our users' logs distinguish between lines, which would complicate such a simple comparison.

Therefore, in my opinion, we should accept this single duplicated line that occurs on rare occasions.

pkg/kube/pod_controller.go

DEVOPS-7803 restore log stream

26e233f

Re-establish connection while the pod is still running

DCkQ6 mentioned this pull request Jul 3, 2024

DEVOPS-7803 restore log stream #2903

Closed

10 tasks

hairyhum reviewed Jul 25, 2024

View reviewed changes

hairyhum assigned e-sumin Aug 15, 2024

DCkQ6 requested a review from hairyhum August 21, 2024 09:50

hairyhum requested review from e-sumin and removed request for hairyhum August 21, 2024 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEVOPS-7803 restore log stream #2972

DEVOPS-7803 restore log stream #2972

DCkQ6 commented Jul 3, 2024

hairyhum left a comment

hairyhum Jul 25, 2024

DCkQ6 Aug 7, 2024

hairyhum Aug 7, 2024

DCkQ6 Aug 14, 2024

DEVOPS-7803 restore log stream #2972

Are you sure you want to change the base?

DEVOPS-7803 restore log stream #2972

Conversation

DCkQ6 commented Jul 3, 2024

Change Overview

Pull request type

Issues

Test Plan

hairyhum left a comment

Choose a reason for hiding this comment

hairyhum Jul 25, 2024

Choose a reason for hiding this comment

DCkQ6 Aug 7, 2024

Choose a reason for hiding this comment

hairyhum Aug 7, 2024

Choose a reason for hiding this comment

DCkQ6 Aug 14, 2024

Choose a reason for hiding this comment