refactor: Add the ability to hide log output unless there was an error #69

jshearer · 2022-09-14T17:14:45Z

This is a short-term solution for #64, where the longer term solution would be emitting structured logs and filtering elsewhere

This change is

This is a short-term solution for #64, where the longer term solution would be emitting structured logs and filtering elsewhere

mdibaiee · 2022-09-15T17:27:45Z

@jshearer would it be possible to set a log_level and so only error logs come out using that? I think most flow components support --log.level 🤔

jgraettinger

Few things:

This is the docker pull invocation which is contributing to noise in the "Publish" operation logs, rather than the ones in the control-plane agent. The discover operation has logs, but they're not shown in the UI.
docker pull does have a --quiet flag. Perhaps all we need to do here is add it (both in the agent's invocations, and in the runtimes?).
I agree with @mdibaiee that --log.level level is the right way to squelch output for commands invoked by the agent. All commands already explicitly thread through a --log.level. At this stage, after quickly surveying values sent to build steps, I think those levels are where we want them.

Reviewable status: 0 of 5 files reviewed, all discussions resolved (waiting on @psFried)

jshearer · 2022-09-15T20:17:36Z

So I think this may have been a miscommunication on my part. The purpose of this change is to prevent a bunch of noise from getting logged and presented to the user if there was no error. Ideally we'd present a special UI component showing that various steps are pending/successful/failed, but prior to that, the idea here is to pluck some of the low hanging fruit in terms of log noise.

So with that being said, I'm not sure how --log.level would help? The idea is to filter these log messages (unless there's an error) in the default mode of operation of this service (i.e running these actions via the UI, where specifying things like --log.level isn't possible).

@jgraettinger WRT that link you sent, I think @psFried showed me that. My understanding is that the flow (heh) is thus:

User clicks "publish" in the web UI, which... I think creates an async task record, which gets picked up by the control-plane agent
Agent then shells out to Go, capturing stdout/stderr,
Go does a bunch of stuff, including shelling out to Docker
Agent writes stdout/stderr to supabase logs using log token etc
Agent writes task status, which the UI picks up and presents accordingly

Ultimately we'd have structured logging and always log the output like we do now, just flagged as either "debug" if the command is successful, or "error" if the command failed. Then clientside (in the UI), we could just filter for logs at "info" or above, or even better we could filter for logs by specific component names, which would be used to present error logs next to the specific subtask that failed.

That being said, my intent with this PR was to change the behavior of step 4 above as a short-term mitigation to log noise. Instead of always writing Go's stdout/stderr to the task logs, the idea is to wait until the invocation is done, and if it exits successfully, then just log a generic Task 'foo' was successful. The reason I parameterized this, as opposed to always changing this logic, is that there are certain logs that we probably always want to show. Ideally we'd do this filtering in the web UI, but without structured logging, I put it here instead.

jgraettinger · 2022-09-15T20:40:25Z

Aside from npm output and docker pull, which is truly not helpful if they are successful... the rest of these logs are actually useful, informational, and/or important for debugging. We do want them logged and captured in the DB, even if only so that we can reference them for debugging.

As a practical matter (and having fixed npm and docker pull) the remaining logs aren't too verbose, though we should keep an eye on it over time.

We should seek to improve UX by hiding them in the UI by default, to be surfaced if the user chooses to drill down into them, but we shouldn't not collect them altogether.

psFried

I think I may have spotted an issue with the log forwarding, but LMK what you think.

psFried · 2022-09-15T17:57:32Z

crates/agent/src/jobs.rs

@@ -43,14 +43,15 @@ pub async fn run(
    name: &str,
    logs_tx: &logs::Tx,
    logs_token: Uuid,
+    only_log_on_error: bool,


totally optional, but my suggestion would be to use an enum instead of a bool for function parameters like this, because OutputLogs::OnError is more readable than true at callsites.

psFried · 2022-09-16T15:54:03Z

crates/agent/src/jobs.rs

+    // or if only_log_on_error wasn't set
+    if !only_log_on_error || (only_log_on_error && !status.success()) {
+        logs::write_line(logs_tx.clone(), name.to_string(), logs_token, &format!("Error while running command: {}", name)).await.unwrap();
+        futures::try_join!(stdin, stdout)?;


I think this could potentially cause a deadlock if the process outputs enough data to fill the stdout buffer. The child process may never exit if it's unable to write it's output, and nothing will be consuming that output until we try to join here.

I don't see where the stderr future is ever polled

I think the solution here will require actively reading both stdout and stderr into growable in-memory buffers, and then writing them to the logs in a separate step if the command fails.

Ah okay, that makes sense

psFried · 2022-09-16T16:09:15Z

@jgraettinger @mdibaiee I'm not sure I understand the suggestion to use --log.level or --quiet. My understanding was that we want to suppress these logs conditionally, based on whether there was an error. But those flags would suppress the logs regardless of whether an error occurred.

We should seek to improve UX by hiding them in the UI by default, to be surfaced if the user chooses to drill down into them, but we shouldn't not collect them altogether.

I couldn't agree more, though I don't think the intent was for this to be done in this PR. The intent of this PR was to do something minimal to squelch the most egregious noise in the (hopefully) common case where the operation was successful. The way I understand it, to collect the logs unconditionally and display them conditionally will require capturing structured logs and filtering them in a query. I believe the intent was to implement that later, having addressed the biggest pain point with this workaround.

jshearer · 2022-09-20T12:53:24Z

Closing per our conversation in favor of --quiet for the time being

refactor: Add the ability to hide log output unless there was an error

e819159

This is a short-term solution for #64, where the longer term solution would be emitting structured logs and filtering elsewhere

jshearer requested review from jgraettinger and psFried September 14, 2022 17:14

jgraettinger reviewed Sep 15, 2022

View reviewed changes

psFried suggested changes Sep 16, 2022

View reviewed changes

jshearer closed this Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Add the ability to hide log output unless there was an error #69

refactor: Add the ability to hide log output unless there was an error #69

jshearer commented Sep 14, 2022 •

edited by jgraettinger

Loading

mdibaiee commented Sep 15, 2022

jgraettinger left a comment

jshearer commented Sep 15, 2022 •

edited

Loading

jgraettinger commented Sep 15, 2022

psFried left a comment

psFried Sep 15, 2022

psFried Sep 16, 2022

jshearer Sep 16, 2022

psFried commented Sep 16, 2022

jshearer commented Sep 20, 2022

refactor: Add the ability to hide log output unless there was an error #69

refactor: Add the ability to hide log output unless there was an error #69

Conversation

jshearer commented Sep 14, 2022 • edited by jgraettinger Loading

mdibaiee commented Sep 15, 2022

jgraettinger left a comment

Choose a reason for hiding this comment

jshearer commented Sep 15, 2022 • edited Loading

jgraettinger commented Sep 15, 2022

psFried left a comment

Choose a reason for hiding this comment

psFried Sep 15, 2022

Choose a reason for hiding this comment

psFried Sep 16, 2022

Choose a reason for hiding this comment

jshearer Sep 16, 2022

Choose a reason for hiding this comment

psFried commented Sep 16, 2022

jshearer commented Sep 20, 2022

jshearer commented Sep 14, 2022 •

edited by jgraettinger

Loading

jshearer commented Sep 15, 2022 •

edited

Loading