Skip to content
This repository has been archived by the owner on Sep 26, 2022. It is now read-only.

Pare down logs output from publication and tests #64

Open
jgraettinger opened this issue Aug 30, 2022 · 2 comments
Open

Pare down logs output from publication and tests #64

jgraettinger opened this issue Aug 30, 2022 · 2 comments
Assignees

Comments

@jgraettinger
Copy link
Member

No description provided.

@jshearer
Copy link
Contributor

jshearer commented Sep 6, 2022

So @jgraettinger, @psFried and I talked about this today, and here is what we ended up coming up with. As a preface, we can absolutely not do this and just squelch most of the NPM output and move on -- this solution is more like an answer to "how should this flow look/work?", and what would be worth doing in order to allow that to happen.

NOTE: This is written in an authoritative voice, like a decision to be logged. My intent is to put it out there before starting work, in order to get feedback, so don't feel like this is set in stone etc.


Leading the UX of discover, publish, materialize etc with a box of streaming logs sends the message that we're not confident in things working, so we're going to preemptively give you the raw system logs "in case something goes wrong". Not that we shouldn't have logs as an option for developers/power users, but we should have a better UX by default.

One benefit of having streaming logs is that it gives the user a feeling of movement. Given that these operations frequently take tens of seconds to minutes to run, it's important that we convey to users that something is happening. Rather than logs, we want to present a checklist of sub-tasks that need to complete before this operation is finished, as well as presenting any errors/failures alongside the sub-task that it occurred during. Roughly, these operations are:

  • Validating yaml/specs
  • Calling connector validate RPC
  • Generate typescript
  • Compiling typescript
  • Calling typescript linter
  • Spin up temp data plane and run tasks/tests
  • Apply captures/materializations to their connectors
  • Activating into data plane (update journals, shards etc)

This is all well and good, but how do we communicate this structured information? We already have a streaming communication mechanism in the form of logs, so we should use that. Ideally we'd convert logs to structured JSON and just use that structure to communicate these semantic state updates, but for various reasons that's not immediately tenable (something something Rust calling Go calling Rust, maybe @jgraettinger can give a better explanation).

The solution we came to is to have the control-plane-agent's logs (which are captured by Rust's tracing crate, which supports structured logging) be included as JSON log lines in the existing log stream for each task. Then all we need to do is add specific log statements with associated metadata when each of the above sub-tasks starts/ends/errors, and that will be consumable by the frontend. This is additionally beneficial because it allows future work to consolidate the whole spec application process into Rust to take advantage of this existing logging infrastructure.

One issue to consider is how we make sure that only the logs for a specific task get send to that task's log_lines. The solution to this is through the tracing crate's Span feature (link describes usage in async contexts via #[instrument]) in order to make sure that any log lines inside of a task-specific Span get sent in JSON form to that tasks log_lines.

The other issue to consider is how the frontend will consume this semi-structured logging. One possibility was to capture all logs as JSON (with the unstructured ones being shoved into something like {"body": "...", "unstructured": true}. The other is to continue writing unstructured logs as plain text lines, and only writing JSON lines for logs coming from a structured (i.e tracing) source. Filtering for JSON log lines is fairly trivial (does it start with { and end with }?), and from there we can use the well-known structure of tracing to filter by log level, component, additional metadata etc.

jshearer added a commit to estuary/flow that referenced this issue Sep 12, 2022
This is a pragmatic first step towards estuary/animated-carnival#64. Ultimately we'll have structured logging and can filter logs after-the-fact, but in the mean time this does a lot to clean up the log output.
jshearer added a commit that referenced this issue Sep 14, 2022
This is a short-term solution for #64, where the longer term solution would be emitting structured logs and filtering elsewhere
jshearer added a commit to estuary/flow that referenced this issue Sep 21, 2022
* refactor: Only output NPM logs if there was an error

This is a pragmatic first step towards estuary/animated-carnival#64. Ultimately we'll have structured logging and can filter logs after-the-fact, but in the mean time this does a lot to clean up the log output.

* refactor: Quiet docker pull output

* refactor: review comments
@jgraettinger
Copy link
Member Author

Can we close this as "we've done what we're going to do right now"?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants