-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage/eager ast #7125
Storage/eager ast #7125
Conversation
Signed-off-by: Ashutosh Narkar <[email protected]>
Signed-off-by: Ashutosh Narkar <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
Signed-off-by: Johan Fylling <[email protected]>
* Handle stored AST values in `bundle.ReadBundleEtagFromStore()` * Adding tests Signed-off-by: Johan Fylling <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look fine @johanfylling. The test coverage is solid! Few comments 👇 . I had a high level comment about how we write data to the store. Currently this is decided by how the data is to be read. Did you consider always writing data as AST to the store and then returning based on the user-provided setting? In this way we don't have two paths in the store implementation.
Secondly, we should add some docs that highlight the pros and cons of this feature.
bundle/store.go
Outdated
@@ -997,6 +1030,7 @@ func LegacyReadRevisionFromStore(ctx context.Context, store storage.Store, txn s | |||
|
|||
// ActivateLegacy calls Activate for the bundles but will also write their manifest to the older unnamed store location. | |||
// Deprecated: Use Activate with named bundles instead. | |||
// FIXME: Test this with AST-read toggled on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either address this or we could remove the legacy activation if we think enough time has passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing legacy activation feels like it should be it's own thing, tracked separately. I'll add some tests.
cmd/flags.go
Outdated
@@ -165,6 +165,11 @@ func addV1CompatibleFlag(fs *pflag.FlagSet, v1Compatible *bool, value bool) { | |||
fs.BoolVar(v1Compatible, "v1-compatible", value, "opt-in to OPA features and behaviors that are enabled by default in OPA v1.0") | |||
} | |||
|
|||
func addReadAstValuesFromStoreFlag(fs *pflag.FlagSet, readAstValuesFromStore *bool, value bool) { | |||
// FIXME: naming? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good name to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern I have here is that without the right context, --read-ast-values
tells you almost nothing about what this flag does.
Changing it to --read-ast-values-from-store
would give some more context; but it's almost misleading, since reading AST values is more of a side effect of what we're actually doing: eagerly converting stored data to AST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--read-ast-values-from-store
seems better and if we add a good description that could help.
eagerly converting stored data to AST.
If we always stored AST then it wouldn't, no? But that has other consequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a UX perspective, the naming here requires insight into OPA internals to understand. Many sysadmin types hardly know what an AST is, but they could know that some policies evaluate slowly and want to fix that.
Perhaps we can consider a name that reflects the effect (faster eval at the cost of higher memory usage) rather than on how that is accomplished? Could even leave room for future optimizations to that end that are unrelated to the AST handling at all.
The implementation details can of course still be covered in the docs for anyone curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like --optimize-inmem-store-for-read-speed
?
It's a bit of a mouthful. We could drop the inmem
part, but this feature doesn't apply to disk stores (but this we can also clarify in the description). We could drop the read
part; but writing, such as bundle updates, will actually be slower. We could swap speed
for perf
/performance
, which most will probably equate to processing speed anyways; but it's a bit too broad, as initial memory footprint/perf will be worse (but likely more stable over time).
v0Compatible bool | ||
v1Compatible bool | ||
traceVarValues bool | ||
ReadAstValuesFromStore bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some tests for the cli package for eval at least?
rego/rego.go
Outdated
@@ -579,6 +579,7 @@ type Rego struct { | |||
compiler *ast.Compiler | |||
store storage.Store | |||
ownStore bool | |||
ownStoreReadAst bool // FIXME: Alternative to a new option, we could add a WritableStore() method that assigns store and sets ownStore to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any drawback of adding a new option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. having a WritableStore()
option would give us some more flexibility, as the API user could then hand any store to the rego
instance and let it write to it. But the current solution gives us a simple on-toggle that should be easier to understand. So I'm good with leaving this as-is.
runtime/runtime.go
Outdated
@@ -240,6 +240,9 @@ type Params struct { | |||
|
|||
// CipherSuites specifies the list of enabled TLS 1.0–1.2 cipher suites | |||
CipherSuites *[]uint16 | |||
|
|||
// FIXME: Document this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
obj.Foreach(func(k *ast.Term, v *ast.Term) { | ||
if k.Equal(key) { | ||
return | ||
} | ||
items = append(items, [2]*ast.Term{k, v}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be very expensive for a large object (and array below) but I don't think we have a way around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's unfortunate. Unless we find some other way of doing this, this is an up-front cost we need to eat on delta bundle updates. A cost that compounds with the eager json-to-AST conversion. If this turns out to be a too large cost, we might need to consider another approach completely; such as lazy conversion, but storing values for re-use (but that has its own drawbacks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be good to document so that users understand the consequences of doing this.
storage/inmem/ast_test.go
Outdated
expected: `{"a": 1, "b": 42, "c": 3}`, | ||
}, | ||
{ | ||
// new keys can be added to objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the difference between this test case and set object key
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This looks like a copy-paste mistake. Will update the test case to be an actual addition.
|
||
// returnASTValuesOnRead, if true, means that the store will eagerly convert data to AST values, | ||
// and return them on Read. | ||
// FIXME: naming(?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. To reiterate my previous point. Since what we're actually doing is eagerly converting to AST on write, this is a bit of a misnomer. We could consider this to be an implementation detail, but then we're kinda hiding the actual purpose behind this feature, which is to eliminate conversion cost on read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returnASTValuesOnRead
is doing what it says so it seems like an implementation detail.
@@ -327,11 +345,45 @@ func (h *handle) Unregister(_ context.Context, txn storage.Transaction) { | |||
} | |||
|
|||
func (db *store) runOnCommitTriggers(ctx context.Context, txn storage.Transaction, event storage.TriggerEvent) { | |||
if db.returnASTValuesOnRead && len(db.triggers) > 0 { | |||
// FIXME: Not very performant for large data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this a caveat we should point out about using this feature so users are aware of this. There seems to be no way to escape the conversion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah .. an alternative would be to not do the conversion here, and push that responsibility to the consumer. Then conversion would only happen when the consumer is interested in the value. Or the consumer could even be updated to deal with AST values in addition to raw data. That'd be a breach in interface contract, though, and a pretty gnarly breaking change that is only detected at runtime, not compiletime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo making the consequences more clear seems better that making a breaking change in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always optimize.
if s.returnASTValuesOnRead { | ||
s.data = ast.NewObject() | ||
} else { | ||
s.data = map[string]interface{}{} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting..we are controlling how data is written to the store based on returnASTValuesOnRead
. Did you consider always writing AST values to the store and return AST/Go values based on returnASTValuesOnRead
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always converting data to AST values would have a huge impact on OPA:s memory footprint and on bundle update performance. If these weren't concerns, then I think there is an argument for completely dropping the old raw data option altogether, as topdown
is agnostic to the data type. Unfortunately, I think this feature is something only for users that either have very little data (at which point the performance boost is likely pretty small anyways), or have the ability to sacrifice these things to improve eval performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an option to do lazy conversion, but storing converted values for re-use (which I've mentioned in other places), might be a middle ground that might be enough of the best of both worlds that it could be enabled by default. This is however an unsubstantiated claim that I haven't thought too hard about.
Signed-off-by: Johan Fylling <[email protected]>
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@johanfylling thanks for addressing my comments. Do you have any performance numbers you can share that show the improvements and potential drawbacks of this change? |
Here are some measurements against the test bundle we've used to evaluate this feature:
I'll collect some measurements for bundle updates later (aiming for tomorrow). |
@@ -59,9 +59,25 @@ func metadataPath(name string) storage.Path { | |||
return append(BundlesBasePath, name, "manifest", "metadata") | |||
} | |||
|
|||
func read(ctx context.Context, store storage.Store, txn storage.Transaction, path storage.Path) (interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be an optimization would be to avoid the conversions in this package by adding support for AST-based logic. This probably is a big enough change that can be worked on in the future if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @johanfylling if we add some documentation about this that would be great.
Signed-off-by: Johan Fylling <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for working on this @johanfylling!
Implements: #4147
Based on PoC by @ashutosh-narkar