Skip to content
This repository has been archived by the owner on Sep 26, 2022. It is now read-only.

refactor: Add the ability to hide log output unless there was an error #69

Closed
wants to merge 1 commit into from

Conversation

jshearer
Copy link
Contributor

@jshearer jshearer commented Sep 14, 2022

This is a short-term solution for #64, where the longer term solution would be emitting structured logs and filtering elsewhere


This change is Reviewable

This is a short-term solution for #64, where the longer term solution would be emitting structured logs and filtering elsewhere
@mdibaiee
Copy link
Member

@jshearer would it be possible to set a log_level and so only error logs come out using that? I think most flow components support --log.level 🤔

Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things:

  • This is the docker pull invocation which is contributing to noise in the "Publish" operation logs, rather than the ones in the control-plane agent. The discover operation has logs, but they're not shown in the UI.
  • docker pull does have a --quiet flag. Perhaps all we need to do here is add it (both in the agent's invocations, and in the runtimes?).
  • I agree with @mdibaiee that --log.level level is the right way to squelch output for commands invoked by the agent. All commands already explicitly thread through a --log.level. At this stage, after quickly surveying values sent to build steps, I think those levels are where we want them.

Reviewable status: 0 of 5 files reviewed, all discussions resolved (waiting on @psFried)

@jshearer
Copy link
Contributor Author

jshearer commented Sep 15, 2022

So I think this may have been a miscommunication on my part. The purpose of this change is to prevent a bunch of noise from getting logged and presented to the user if there was no error. Ideally we'd present a special UI component showing that various steps are pending/successful/failed, but prior to that, the idea here is to pluck some of the low hanging fruit in terms of log noise.

So with that being said, I'm not sure how --log.level would help? The idea is to filter these log messages (unless there's an error) in the default mode of operation of this service (i.e running these actions via the UI, where specifying things like --log.level isn't possible).

@jgraettinger WRT that link you sent, I think @psFried showed me that. My understanding is that the flow (heh) is thus:

  1. User clicks "publish" in the web UI, which... I think creates an async task record, which gets picked up by the control-plane agent
  2. Agent then shells out to Go, capturing stdout/stderr,
  3. Go does a bunch of stuff, including shelling out to Docker
  4. Agent writes stdout/stderr to supabase logs using log token etc
  5. Agent writes task status, which the UI picks up and presents accordingly

Ultimately we'd have structured logging and always log the output like we do now, just flagged as either "debug" if the command is successful, or "error" if the command failed. Then clientside (in the UI), we could just filter for logs at "info" or above, or even better we could filter for logs by specific component names, which would be used to present error logs next to the specific subtask that failed.

That being said, my intent with this PR was to change the behavior of step 4 above as a short-term mitigation to log noise. Instead of always writing Go's stdout/stderr to the task logs, the idea is to wait until the invocation is done, and if it exits successfully, then just log a generic Task 'foo' was successful. The reason I parameterized this, as opposed to always changing this logic, is that there are certain logs that we probably always want to show. Ideally we'd do this filtering in the web UI, but without structured logging, I put it here instead.

@jgraettinger
Copy link
Member

Aside from npm output and docker pull, which is truly not helpful if they are successful... the rest of these logs are actually useful, informational, and/or important for debugging. We do want them logged and captured in the DB, even if only so that we can reference them for debugging.

As a practical matter (and having fixed npm and docker pull) the remaining logs aren't too verbose, though we should keep an eye on it over time.

We should seek to improve UX by hiding them in the UI by default, to be surfaced if the user chooses to drill down into them, but we shouldn't not collect them altogether.

Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may have spotted an issue with the log forwarding, but LMK what you think.

@@ -43,14 +43,15 @@ pub async fn run(
name: &str,
logs_tx: &logs::Tx,
logs_token: Uuid,
only_log_on_error: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally optional, but my suggestion would be to use an enum instead of a bool for function parameters like this, because OutputLogs::OnError is more readable than true at callsites.

// or if only_log_on_error wasn't set
if !only_log_on_error || (only_log_on_error && !status.success()) {
logs::write_line(logs_tx.clone(), name.to_string(), logs_token, &format!("Error while running command: {}", name)).await.unwrap();
futures::try_join!(stdin, stdout)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think this could potentially cause a deadlock if the process outputs enough data to fill the stdout buffer. The child process may never exit if it's unable to write it's output, and nothing will be consuming that output until we try to join here.
  • I don't see where the stderr future is ever polled

I think the solution here will require actively reading both stdout and stderr into growable in-memory buffers, and then writing them to the logs in a separate step if the command fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, that makes sense

@psFried
Copy link
Member

psFried commented Sep 16, 2022

@jgraettinger @mdibaiee I'm not sure I understand the suggestion to use --log.level or --quiet. My understanding was that we want to suppress these logs conditionally, based on whether there was an error. But those flags would suppress the logs regardless of whether an error occurred.

We should seek to improve UX by hiding them in the UI by default, to be surfaced if the user chooses to drill down into them, but we shouldn't not collect them altogether.

I couldn't agree more, though I don't think the intent was for this to be done in this PR. The intent of this PR was to do something minimal to squelch the most egregious noise in the (hopefully) common case where the operation was successful. The way I understand it, to collect the logs unconditionally and display them conditionally will require capturing structured logs and filtering them in a query. I believe the intent was to implement that later, having addressed the biggest pain point with this workaround.

@jshearer
Copy link
Contributor Author

Closing per our conversation in favor of --quiet for the time being

@jshearer jshearer closed this Sep 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants