Replies: 7 comments 9 replies
-
Overall, I'm a big fan of the approach! A few scattered opinions:
I think in the managed service they'd be named Not to get too inception-y, but I wonder a bit about AuthZ of the individual log / metric documents (where I'm not allowed to even know metrics of some other team's stuff). Definitely a "not now," but there's a "... but" there. Document-level and even document-location-level access control are big question marks to me. Maybe the former is just a special case of the latter? |
Beta Was this translation helpful? Give feedback.
-
Here's another crack at collection schemas, taking Johnny's feedback into account.
So My proposal for the tenant name is that we hard-code the tenant name of
An alternative that I've been thinking about is partition-level AuthZ. If Flow's AuthZ model allowed for permissions on a per-partition basis, then it would allow for logs and metrics to be permissioned on a per-collection basis (user A can only view the |
Beta Was this translation helpful? Give feedback.
-
Yeah, agreed, and this is why I had initially suggested adding |
Beta Was this translation helpful? Give feedback.
-
One other question on metrics: should we actually want to represent type:COUNT metrics as deltas, rather than absolutes ? Prometheus chose lifetime counts because it's 1) pull-based, and 2) at-least-once. We're push-based, exactly-once, and have a built in notion of reductions. If the collection is keyed on the full metric name, then a simple materialization is already a total count. By keeping metrics as deltas, it makes it easier to express roll-ups when aggregating across metrics (because you can trivially already know how much count mass to add to the roll-up, whereas a total count requires that you keep a register). |
Beta Was this translation helpful? Give feedback.
-
Update on my thinking regarding error reporting: The initial idea was for stats to have an If stats are fundamentally a record of only committed transactions, then a trivial materialization of stats documents gives you correct statistics about the data that is actually in your collection. If we keep |
Beta Was this translation helpful? Give feedback.
-
Logs are fundamentally scoped to a task. For captures and materializations, it's pretty easy to capture the stderr of the driver and associate it with the specific task. But for derivations, this is not possible, because there is a single nodejs process that's shared by all derivations with the same commons id. That's a bit of a bummer, because I'd really like to have any One idea would be to just forget about capturing logs from nodejs and focus on capturing them from deno instead, whenever we get around to supporting it. My hunch is that there won't be a compelling reason to support both long-term, anyway, and that deno will simply replace the node runtime altogether. If that turns out to be the case, then this option would seem like a pretty decent idea. Another option is to spawn speparate nodejs processes for each derivation task. This might be worthwhile, if we don't expect to add deno support anytime soon. My own disposition at this point is to plan on supporting deno and see where that gets us. I'm curious if anyone else has thoughts on this. |
Beta Was this translation helpful? Give feedback.
-
The initial logging PR is now merged. With that, the Flow runtime now publishes logs into a separate collection per tenant. The tenant name is the first path component of a task, so a task named Whenever a task term is initialized, the runtime will create a The takeaway for developers is that you should always use the |
Beta Was this translation helpful? Give feedback.
-
The basic idea is for Flow to expose collection-related metrics and meta-data as Flow collection, rather than only via prometheus metrics. This meta-collection would be created and have documents added to it automatically. The next thing we need to decide is what information should be provided, and how it should be organized into automatic collection(s). What follows is an initial proposal to serve as a starting point for the conversation.
Here's a proposed JSON schema of the collection:
There's a few different ideas worth discussing:
This schema models things as if all the events were in a single collection that's logically partitioned on
[/eventType, /taskType, /name]
. The key of the collection would include those fields, and also/shard
and/ts
. The thought here is that a logically partitioned collection is essentially as easy to work with as separate collections, and that it feels reasonable to cram everything into one schema. It might not feel so reasonable if we think of a lot more fields we'd want to add, though.The shard id is essentially decomposed into separate fields for taskType, name, and the key/rclock ranges. The thought there is that it would be common for users to want to read events related to only captures, for example, and that representing them as separate fields would make that easier because we could logically partition on
taskType
andname
. But doing things that way requires the shard key ranges to be represented separately so that you can disambiguate events that come from different shards, and having"shard": "00000000-00000000"
feels maybe a little weird. Perhaps making shard more of an opaque id would be better?One potential source of bloat here would be error types. I think we can keep this pretty minimal, though. The idea is that the only errors that are really actionable are ones that arise from some code that the user potentially wrote (connectors or derivations), or schema validation errors. Everything else seems like our problem, and so probably not something we should expose to the user.
In terms of how all this gets wired up, I like the idea of there being essentially one meta-collection per tenant, that gets created automatically. But I think this might get a little weird because I don't know what we'd name it. If we hard-code a name like
flow/metrics
, then it violates the assumption that collections exist in a global namespace. Do we need to add a top-level field to our yaml for the user to specify a tenant name, so we can use<tenant-name>/flow/metrics
?Edited to update the schema since I realized the
read
andwrite
fields didn't actually make any sense 🙃Beta Was this translation helpful? Give feedback.
All reactions