Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e logging: Hard to follow workload events #1597

Open
nirs opened this issue Oct 22, 2024 · 1 comment · May be fixed by #1628
Open

e2e logging: Hard to follow workload events #1597

nirs opened this issue Oct 22, 2024 · 1 comment · May be fixed by #1628
Labels
bug Something isn't working test Testing related issue

Comments

@nirs
Copy link
Member

nirs commented Oct 22, 2024

The current logs make following single workload events hard:

Example logs

2024-10-22T16:15:37.201+0530	INFO	dractions/actions.go:115	enter Failover appset-deploy-cephfs-busybox
2024-10-22T16:15:37.273+0530	INFO	dractions/retry.go:67	drpc appset-deploy-cephfs-busybox is ready
2024-10-22T16:15:37.432+0530	INFO	dractions/actions.go:187	update drpc appset-deploy-cephfs-busybox failover to rdr-dr1
2024-10-22T16:15:42.462+0530	INFO	dractions/retry.go:124	drpc appset-deploy-rbd-busybox phase is FailedOver
2024-10-22T16:15:47.312+0530	INFO	dractions/retry.go:124	drpc subscr-deploy-rbd-busybox phase is FailedOver
2024-10-22T16:16:12.663+0530	INFO	dractions/retry.go:124	drpc appset-deploy-cephfs-busybox phase is FailedOver
2024-10-22T16:18:12.705+0530	INFO	dractions/retry.go:67	drpc appset-deploy-rbd-busybox is ready

We need this format:

timestamp level test-id message

Example logs converted to wanted format:

2024-10-22T16:15:37.201+0530 INFO  [appset-cephfs] enter Failover
2024-10-22T16:15:37.273+0530 INFO  [appset-cephfs] drpc is ready
2024-10-22T16:15:37.432+0530 INFO  [appset-cephfs] starting failover to rdr-dr1
2024-10-22T16:15:42.462+0530 INFO  [appset-rbd] drpc phase is FailedOver
2024-10-22T16:15:47.312+0530 INFO  [subscr-rbd] drpc phase is FailedOver
2024-10-22T16:16:12.663+0530 INFO  [appset-cephfs] drpc  phase is FailedOver
2024-10-22T16:18:12.705+0530 INFO  [appset-rbd] drpc is ready

Notes:

  • Short workload names without repeated parts (deploy, busybox)
  • aligned with spaces, not tabs for more compact logs
@nirs nirs added bug Something isn't working test Testing related issue labels Oct 22, 2024
@nirs
Copy link
Member Author

nirs commented Oct 28, 2024

@raghavendra-talur what do you think?

nirs added a commit to nirs/ramen that referenced this issue Oct 30, 2024
We used to log everything using the global logger. This makes the log
messy and hard to follow, since we run 5 dr flows concurrently. This
will be worse as we add new workloads (e.g. flattened pvcs, cloned pvcs,
volsync with block pvc).

We use now per-test logger created using the name of the workload (e.g.
disapp-deploy-rbd-busybox). Every log message from test flow start with
this name. The rest of the log does not specify resources names since we
use the same name for all resources (drpc, placement, subscription,
etc).

Clean up log messages, replacing noisy text with simple description of
the event.

More work is needed, we still log too much uninteresting details, and do
no not log what we wait for, but this is already much easier to follow.

Example log with this change:

    === RUN   TestSuites/Exhaustive/Deploy-rbd/Subscr/Enable
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-rbd-busybox	Protecting workload
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-cephfs-busybox	Protecting workload
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Workload running on dr1
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Annotating placement
    2024-10-31T00:37:41.926+0200	INFO	appset-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.930+0200	INFO	subscr-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.932+0200	INFO	subscr-deploy-cephfs-busybox	Creating drpc
    2024-10-31T00:38:52.038+0200	INFO	appset-deploy-cephfs-busybox	drpc is ready

Fixes: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
@nirs nirs linked a pull request Oct 30, 2024 that will close this issue
nirs added a commit to nirs/ramen that referenced this issue Oct 30, 2024
We used to log everything using the global logger. This makes the log
messy and hard to follow, since we run 5 dr flows concurrently. This
will be worse as we add new workloads (e.g. flattened pvcs, cloned pvcs,
volsync with block pvc).

We use now per-test logger created using the name of the workload (e.g.
disapp-deploy-rbd-busybox). Every log message from test flow start with
this name. The rest of the log does not specify resources names since we
use the same name for all resources (drpc, placement, subscription,
etc).

Clean up log messages, replacing noisy text with simple description of
the event.

More work is needed, we still log too much uninteresting details, and do
no not log what we wait for, but this is already much easier to follow.

Example log with this change:

    === RUN   TestSuites/Exhaustive/Deploy-rbd/Subscr/Enable
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-rbd-busybox	Protecting workload
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-cephfs-busybox	Protecting workload
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Workload running on dr1
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Annotating placement
    2024-10-31T00:37:41.926+0200	INFO	appset-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.930+0200	INFO	subscr-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.932+0200	INFO	subscr-deploy-cephfs-busybox	Creating drpc
    2024-10-31T00:38:52.038+0200	INFO	appset-deploy-cephfs-busybox	drpc is ready

Fixes: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Oct 31, 2024
This is the recommended way based on logr docs, probably to make it
easier to use, since you don't need to check if the logger is nil, and
zero value of a logger just discards data. This is also the way logr is
used in ramen, so there is no reason to use it differently in the e2e
code.

Part-of: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Oct 31, 2024
This info is usually not helpful and harmful by taking spaces needed for
important info in the logs. When logging error we have file:lineno info
in the traceback.

Example log with this change:

    === RUN   TestSuites/Exhaustive/Deploy-cephfs#01/Appset/Enable
    2024-10-30T16:46:14.331+0200	INFO	enter EnableProtection appset-deploy-rbd-busybox
    2024-10-30T16:46:14.331+0200	INFO	enter EnableProtection appset-deploy-cephfs-busybox
    2024-10-30T16:46:16.995+0200	INFO	deployment busybox is ready
    2024-10-30T16:46:16.995+0200	INFO	workload Deploy-rbd is ready
    2024-10-30T16:46:16.995+0200	INFO	disapp-deploy-rbd-busybox is deployed

Part-of: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Oct 31, 2024
We used to log everything using the global logger. This makes the log
messy and hard to follow, since we run 5 dr flows concurrently. This
will be worse as we add new workloads (e.g. flattened pvcs, cloned pvcs,
volsync with block pvc).

We use now per-test logger created using the name of the workload (e.g.
disapp-deploy-rbd-busybox). Every log message from test flow start with
this name. The rest of the log does not specify resources names since we
use the same name for all resources (drpc, placement, subscription,
etc).

Clean up log messages, replacing noisy text with simple description of
the event.

More work is needed, we still log too much uninteresting details, and do
no not log what we wait for, but this is already much easier to follow.

Example log with this change:

    === RUN   TestSuites/Exhaustive/Deploy-rbd/Subscr/Enable
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-rbd-busybox	Protecting workload
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-cephfs-busybox	Protecting workload
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Workload running on dr1
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Annotating placement
    2024-10-31T00:37:41.926+0200	INFO	appset-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.930+0200	INFO	subscr-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.932+0200	INFO	subscr-deploy-cephfs-busybox	Creating drpc
    2024-10-31T00:38:52.038+0200	INFO	appset-deploy-cephfs-busybox	drpc is ready

Part-of: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Oct 31, 2024
Adding the same text to all workload is not helpful. Shorter names
are important to make it easy to scan the logs.

Currently with test everything with deployment using busybox image. When
we will add new workload we can consider changing the names, but we need
just one word to make the workload unique.

I want to add vm based tests later, but these will run with a special
environment, so they can use different test suite.

Example logs with this change:

    2024-10-31T19:39:11.391+0200	INFO	subscr-rbd	Protecting workload
    2024-10-31T19:39:11.467+0200	INFO	appset-rbd	Workload running on dr1
    2024-10-31T19:39:11.467+0200	INFO	appset-rbd	Annotating placement
    2024-10-31T19:39:11.467+0200	INFO	appset-cephfs	Workload running on dr2
    2024-10-31T19:39:11.467+0200	INFO	appset-cephfs	Annotating placement
    2024-10-31T19:39:11.467+0200	INFO	subscr-rbd	Workload running on dr1
    2024-10-31T19:39:11.467+0200	INFO	subscr-rbd	Annotating placement
    2024-10-31T19:39:11.485+0200	INFO	appset-cephfs	Creating drpc
    2024-10-31T19:39:11.571+0200	INFO	appset-rbd	Creating drpc
    2024-10-31T19:39:11.767+0200	INFO	subscr-rbd	Creating drpc
    2024-10-31T19:40:37.560+0200	INFO	disapp-rbd	drpc is ready

Fixes: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Nov 4, 2024
This is the recommended way based on logr docs, probably to make it
easier to use, since you don't need to check if the logger is nil, and
zero value of a logger just discards data. This is also the way logr is
used in ramen, so there is no reason to use it differently in the e2e
code.

Part-of: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Nov 4, 2024
This info is usually not helpful and harmful by taking spaces needed for
important info in the logs. When logging error we have file:lineno info
in the traceback.

Example log with this change:

    === RUN   TestSuites/Exhaustive/Deploy-cephfs#01/Appset/Enable
    2024-10-30T16:46:14.331+0200	INFO	enter EnableProtection appset-deploy-rbd-busybox
    2024-10-30T16:46:14.331+0200	INFO	enter EnableProtection appset-deploy-cephfs-busybox
    2024-10-30T16:46:16.995+0200	INFO	deployment busybox is ready
    2024-10-30T16:46:16.995+0200	INFO	workload Deploy-rbd is ready
    2024-10-30T16:46:16.995+0200	INFO	disapp-deploy-rbd-busybox is deployed

Part-of: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Nov 4, 2024
We used to log everything using the global logger. This makes the log
messy and hard to follow, since we run 5 dr flows concurrently. This
will be worse as we add new workloads (e.g. flattened pvcs, cloned pvcs,
volsync with block pvc).

We use now per-test logger created using the name of the workload (e.g.
disapp-deploy-rbd-busybox). Every log message from test flow start with
this name. The rest of the log does not specify resources names since we
use the same name for all resources (drpc, placement, subscription,
etc).

Clean up log messages, replacing noisy text with simple description of
the event.

More work is needed, we still log too much uninteresting details, and do
no not log what we wait for, but this is already much easier to follow.

Example log with this change:

    === RUN   TestSuites/Exhaustive/Deploy-rbd/Subscr/Enable
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-rbd-busybox	Protecting workload
    2024-10-31T00:37:41.913+0200	INFO	subscr-deploy-cephfs-busybox	Protecting workload
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.915+0200	INFO	appset-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Workload running on dr2
    2024-10-31T00:37:41.917+0200	INFO	subscr-deploy-rbd-busybox	Annotating placement
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Workload running on dr1
    2024-10-31T00:37:41.918+0200	INFO	subscr-deploy-cephfs-busybox	Annotating placement
    2024-10-31T00:37:41.926+0200	INFO	appset-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.930+0200	INFO	subscr-deploy-rbd-busybox	Creating drpc
    2024-10-31T00:37:41.932+0200	INFO	subscr-deploy-cephfs-busybox	Creating drpc
    2024-10-31T00:38:52.038+0200	INFO	appset-deploy-cephfs-busybox	drpc is ready

Part-of: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/ramen that referenced this issue Nov 4, 2024
Adding the same text to all workload is not helpful. Shorter names
are important to make it easy to scan the logs.

Currently with test everything with deployment using busybox image. When
we will add new workload we can consider changing the names, but we need
just one word to make the workload unique.

I want to add vm based tests later, but these will run with a special
environment, so they can use different test suite.

Example logs with this change:

    2024-10-31T19:39:11.391+0200	INFO	subscr-rbd	Protecting workload
    2024-10-31T19:39:11.467+0200	INFO	appset-rbd	Workload running on dr1
    2024-10-31T19:39:11.467+0200	INFO	appset-rbd	Annotating placement
    2024-10-31T19:39:11.467+0200	INFO	appset-cephfs	Workload running on dr2
    2024-10-31T19:39:11.467+0200	INFO	appset-cephfs	Annotating placement
    2024-10-31T19:39:11.467+0200	INFO	subscr-rbd	Workload running on dr1
    2024-10-31T19:39:11.467+0200	INFO	subscr-rbd	Annotating placement
    2024-10-31T19:39:11.485+0200	INFO	appset-cephfs	Creating drpc
    2024-10-31T19:39:11.571+0200	INFO	appset-rbd	Creating drpc
    2024-10-31T19:39:11.767+0200	INFO	subscr-rbd	Creating drpc
    2024-10-31T19:40:37.560+0200	INFO	disapp-rbd	drpc is ready

Fixes: RamenDR#1597
Signed-off-by: Nir Soffer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Testing related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant