Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage/eager ast #7125

Merged
merged 15 commits into from
Oct 30, 2024
Merged

Conversation

johanfylling
Copy link
Contributor

Implements: #4147

Based on PoC by @ashutosh-narkar

@johanfylling johanfylling marked this pull request as ready for review October 21, 2024 15:35
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine @johanfylling. The test coverage is solid! Few comments 👇 . I had a high level comment about how we write data to the store. Currently this is decided by how the data is to be read. Did you consider always writing data as AST to the store and then returning based on the user-provided setting? In this way we don't have two paths in the store implementation.

Secondly, we should add some docs that highlight the pros and cons of this feature.

bundle/store.go Outdated
@@ -997,6 +1030,7 @@ func LegacyReadRevisionFromStore(ctx context.Context, store storage.Store, txn s

// ActivateLegacy calls Activate for the bundles but will also write their manifest to the older unnamed store location.
// Deprecated: Use Activate with named bundles instead.
// FIXME: Test this with AST-read toggled on?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either address this or we could remove the legacy activation if we think enough time has passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing legacy activation feels like it should be it's own thing, tracked separately. I'll add some tests.

cmd/flags.go Outdated
@@ -165,6 +165,11 @@ func addV1CompatibleFlag(fs *pflag.FlagSet, v1Compatible *bool, value bool) {
fs.BoolVar(v1Compatible, "v1-compatible", value, "opt-in to OPA features and behaviors that are enabled by default in OPA v1.0")
}

func addReadAstValuesFromStoreFlag(fs *pflag.FlagSet, readAstValuesFromStore *bool, value bool) {
// FIXME: naming?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good name to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern I have here is that without the right context, --read-ast-values tells you almost nothing about what this flag does.
Changing it to --read-ast-values-from-store would give some more context; but it's almost misleading, since reading AST values is more of a side effect of what we're actually doing: eagerly converting stored data to AST.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--read-ast-values-from-store seems better and if we add a good description that could help.

eagerly converting stored data to AST.

If we always stored AST then it wouldn't, no? But that has other consequences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a UX perspective, the naming here requires insight into OPA internals to understand. Many sysadmin types hardly know what an AST is, but they could know that some policies evaluate slowly and want to fix that.

Perhaps we can consider a name that reflects the effect (faster eval at the cost of higher memory usage) rather than on how that is accomplished? Could even leave room for future optimizations to that end that are unrelated to the AST handling at all.

The implementation details can of course still be covered in the docs for anyone curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like --optimize-inmem-store-for-read-speed?
It's a bit of a mouthful. We could drop the inmem part, but this feature doesn't apply to disk stores (but this we can also clarify in the description). We could drop the read part; but writing, such as bundle updates, will actually be slower. We could swap speed for perf/performance, which most will probably equate to processing speed anyways; but it's a bit too broad, as initial memory footprint/perf will be worse (but likely more stable over time).

v0Compatible bool
v1Compatible bool
traceVarValues bool
ReadAstValuesFromStore bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some tests for the cli package for eval at least?

rego/rego.go Outdated
@@ -579,6 +579,7 @@ type Rego struct {
compiler *ast.Compiler
store storage.Store
ownStore bool
ownStoreReadAst bool // FIXME: Alternative to a new option, we could add a WritableStore() method that assigns store and sets ownStore to true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any drawback of adding a new option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. having a WritableStore() option would give us some more flexibility, as the API user could then hand any store to the rego instance and let it write to it. But the current solution gives us a simple on-toggle that should be easier to understand. So I'm good with leaving this as-is.

@@ -240,6 +240,9 @@ type Params struct {

// CipherSuites specifies the list of enabled TLS 1.0–1.2 cipher suites
CipherSuites *[]uint16

// FIXME: Document this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +265 to +270
obj.Foreach(func(k *ast.Term, v *ast.Term) {
if k.Equal(key) {
return
}
items = append(items, [2]*ast.Term{k, v})
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very expensive for a large object (and array below) but I don't think we have a way around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's unfortunate. Unless we find some other way of doing this, this is an up-front cost we need to eat on delta bundle updates. A cost that compounds with the eager json-to-AST conversion. If this turns out to be a too large cost, we might need to consider another approach completely; such as lazy conversion, but storing values for re-use (but that has its own drawbacks).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be good to document so that users understand the consequences of doing this.

expected: `{"a": 1, "b": 42, "c": 3}`,
},
{
// new keys can be added to objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the difference between this test case and set object key:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This looks like a copy-paste mistake. Will update the test case to be an actual addition.


// returnASTValuesOnRead, if true, means that the store will eagerly convert data to AST values,
// and return them on Read.
// FIXME: naming(?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. To reiterate my previous point. Since what we're actually doing is eagerly converting to AST on write, this is a bit of a misnomer. We could consider this to be an implementation detail, but then we're kinda hiding the actual purpose behind this feature, which is to eliminate conversion cost on read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returnASTValuesOnRead is doing what it says so it seems like an implementation detail.

@@ -327,11 +345,45 @@ func (h *handle) Unregister(_ context.Context, txn storage.Transaction) {
}

func (db *store) runOnCommitTriggers(ctx context.Context, txn storage.Transaction, event storage.TriggerEvent) {
if db.returnASTValuesOnRead && len(db.triggers) > 0 {
// FIXME: Not very performant for large data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this a caveat we should point out about using this feature so users are aware of this. There seems to be no way to escape the conversion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah .. an alternative would be to not do the conversion here, and push that responsibility to the consumer. Then conversion would only happen when the consumer is interested in the value. Or the consumer could even be updated to deal with AST values in addition to raw data. That'd be a breach in interface contract, though, and a pretty gnarly breaking change that is only detected at runtime, not compiletime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo making the consequences more clear seems better that making a breaking change in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always optimize.

Comment on lines +51 to +56
if s.returnASTValuesOnRead {
s.data = ast.NewObject()
} else {
s.data = map[string]interface{}{}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting..we are controlling how data is written to the store based on returnASTValuesOnRead. Did you consider always writing AST values to the store and return AST/Go values based on returnASTValuesOnRead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always converting data to AST values would have a huge impact on OPA:s memory footprint and on bundle update performance. If these weren't concerns, then I think there is an argument for completely dropping the old raw data option altogether, as topdown is agnostic to the data type. Unfortunately, I think this feature is something only for users that either have very little data (at which point the performance boost is likely pretty small anyways), or have the ability to sacrifice these things to improve eval performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an option to do lazy conversion, but storing converted values for re-use (which I've mentioned in other places), might be a middle ground that might be enough of the best of both worlds that it could be enabled by default. This is however an unsubstantiated claim that I haven't thought too hard about.

Copy link

netlify bot commented Oct 25, 2024

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit b515b9d
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/6721e414a919b20008e0462c
😎 Deploy Preview https://deploy-preview-7125--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ashutosh-narkar
Copy link
Member

@johanfylling thanks for addressing my comments. Do you have any performance numbers you can share that show the improvements and potential drawbacks of this change?

@johanfylling
Copy link
Contributor Author

Here are some measurements against the test bundle we've used to evaluate this feature:

opa bench (disabled):

opa bench data.policy.ingress -b parameter-test-cases
+-------------------------------------------------+--------------+
| samples                                         |           69 |
| ns/op                                           |     16246874 |
| B/op                                            |     16602796 |
| allocs/op                                       |       428321 |
| histogram_timer_rego_external_resolve_ns_75%    |          333 |
| histogram_timer_rego_external_resolve_ns_90%    |          375 |
| histogram_timer_rego_external_resolve_ns_95%    |          437 |
| histogram_timer_rego_external_resolve_ns_99%    |         1250 |
| histogram_timer_rego_external_resolve_ns_99.9%  |         1250 |
| histogram_timer_rego_external_resolve_ns_99.99% |         1250 |
| histogram_timer_rego_external_resolve_ns_count  |         69.0 |
| histogram_timer_rego_external_resolve_ns_max    |         1250 |
| histogram_timer_rego_external_resolve_ns_mean   |          277 |
| histogram_timer_rego_external_resolve_ns_median |          250 |
| histogram_timer_rego_external_resolve_ns_min    |          125 |
| histogram_timer_rego_external_resolve_ns_stddev |          151 |
| histogram_timer_rego_query_eval_ns_75%          |     17299104 |
| histogram_timer_rego_query_eval_ns_90%          |     19534417 |
| histogram_timer_rego_query_eval_ns_95%          |     21096479 |
| histogram_timer_rego_query_eval_ns_99%          |     23926584 |
| histogram_timer_rego_query_eval_ns_99.9%        |     23926584 |
| histogram_timer_rego_query_eval_ns_99.99%       |     23926584 |
| histogram_timer_rego_query_eval_ns_count        |         69.0 |
| histogram_timer_rego_query_eval_ns_max          |     23926584 |
| histogram_timer_rego_query_eval_ns_mean         |     16228025 |
| histogram_timer_rego_query_eval_ns_median       |     15488667 |
| histogram_timer_rego_query_eval_ns_min          |     13838958 |
| histogram_timer_rego_query_eval_ns_stddev       |      2204325 |
+-------------------------------------------------+--------------+

opa eval --profile (disabled):

opa eval --profile --format=pretty --count=10 -b ./parameter-test-cases 'data.policy.ingress'
{
  "allow": true
}
+--------------------------------+-----------+-----------+-----------------+-------------------+----------------+
|             METRIC             |    MIN    |    MAX    |      MEAN       |        90%        |      99%       |
+--------------------------------+-----------+-----------+-----------------+-------------------+----------------+
| timer_rego_data_parse_ns       | 78743874  | 87138586  | 8.16318753e+07  | 8.67692151e+07    | 8.7138586e+07  |
| timer_rego_external_resolve_ns | 291       | 8583      | 1220.7          | 7799.700000000002 | 8583           |
| timer_rego_load_bundles_ns     | 112969917 | 125805667 | 1.171694626e+08 | 1.251271128e+08   | 1.25805667e+08 |
| timer_rego_module_compile_ns   | 31520875  | 37062375  | 3.35296167e+07  | 3.69148459e+07    | 3.7062375e+07  |
| timer_rego_module_parse_ns     | 6809627   | 11737961  | 9.0055879e+06   | 1.16811895e+07    | 1.1737961e+07  |
| timer_rego_query_compile_ns    | 54209     | 124209    | 77483.4         | 123667.3          | 124209         |
| timer_rego_query_eval_ns       | 14280000  | 17962292  | 1.67852792e+07  | 1.7932017e+07     | 1.7962292e+07  |
| timer_rego_query_parse_ns      | 35916     | 68041     | 46933.3         | 68007.8           | 68041          |
+--------------------------------+-----------+-----------+-----------------+-------------------+----------------+
+------------+-------------+-------------+-------------+-------------+----------+----------+--------------+---------------------------------------------------------------------------------------+
|    MIN     |     MAX     |    MEAN     |     90%     |     99%     | NUM EVAL | NUM REDO | NUM GEN EXPR |                                       LOCATION                                        |
+------------+-------------+-------------+-------------+-------------+----------+----------+--------------+---------------------------------------------------------------------------------------+
| 9.953458ms | 12.760292ms | 11.758979ms | 12.753292ms | 12.760292ms | 1        | 1        | 1            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:69 |
| 4.141582ms | 5.108708ms  | 4.829816ms  | 5.108495ms  | 5.108708ms  | 2        | 2        | 2            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:77 |
| 44.376µs   | 69.625µs    | 56.054µs    | 69.162µs    | 69.625µs    | 2        | 2        | 2            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:46 |
| 24.793µs   | 49.083µs    | 35.699µs    | 48.737µs    | 49.083µs    | 1        | 1        | 1            | data.policy.ingress                                                                   |
| 23.125µs   | 47.625µs    | 30.862µs    | 46.57µs     | 47.625µs    | 3        | 3        | 3            | parameter-test-cases/policy/ingress/test.rego:16                                      |
| 10.542µs   | 29.459µs    | 16.929µs    | 29.033µs    | 29.459µs    | 1        | 1        | 1            | parameter-test-cases/policy/ingress/test.rego:28                                      |
| 10.042µs   | 17.333µs    | 12.737µs    | 17.312µs    | 17.333µs    | 1        | 1        | 1            | parameter-test-cases/policy/ingress/test.rego:36                                      |
| 8.041µs    | 15.333µs    | 11.162µs    | 15.174µs    | 15.333µs    | 1        | 1        | 1            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:47 |
| 7.541µs    | 13.458µs    | 9.558µs     | 13.166µs    | 13.458µs    | 2        | 2        | 2            | parameter-test-cases/global/library/authorization/app_roles/app_roles.rego:26         |
| 5.625µs    | 10.833µs    | 7.591µs     | 10.633µs    | 10.833µs    | 1        | 0        | 1            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:35 |
+------------+-------------+-------------+-------------+-------------+----------+----------+--------------+---------------------------------------------------------------------------------------+

opa bench (enabled):

opa bench data.policy.ingress -b parameter-test-cases --read-ast-values
+-------------------------------------------------+--------------+
| samples                                         |        20536 |
| ns/op                                           |        55442 |
| B/op                                            |        49699 |
| allocs/op                                       |          857 |
| histogram_timer_rego_external_resolve_ns_75%    |          125 |
| histogram_timer_rego_external_resolve_ns_90%    |          126 |
| histogram_timer_rego_external_resolve_ns_95%    |          166 |
| histogram_timer_rego_external_resolve_ns_99%    |          209 |
| histogram_timer_rego_external_resolve_ns_99.9%  |         1281 |
| histogram_timer_rego_external_resolve_ns_99.99% |         1292 |
| histogram_timer_rego_external_resolve_ns_count  |        20536 |
| histogram_timer_rego_external_resolve_ns_max    |         1292 |
| histogram_timer_rego_external_resolve_ns_mean   |          106 |
| histogram_timer_rego_external_resolve_ns_median |         84.0 |
| histogram_timer_rego_external_resolve_ns_min    |         41.0 |
| histogram_timer_rego_external_resolve_ns_stddev |         62.3 |
| histogram_timer_rego_query_eval_ns_75%          |        51959 |
| histogram_timer_rego_query_eval_ns_90%          |        66138 |
| histogram_timer_rego_query_eval_ns_95%          |        74440 |
| histogram_timer_rego_query_eval_ns_99%          |        88304 |
| histogram_timer_rego_query_eval_ns_99.9%        |       126341 |
| histogram_timer_rego_query_eval_ns_99.99%       |       126542 |
| histogram_timer_rego_query_eval_ns_count        |        20536 |
| histogram_timer_rego_query_eval_ns_max          |       126542 |
| histogram_timer_rego_query_eval_ns_mean         |        49367 |
| histogram_timer_rego_query_eval_ns_median       |        44854 |
| histogram_timer_rego_query_eval_ns_min          |        38166 |
| histogram_timer_rego_query_eval_ns_stddev       |        11179 |
+-------------------------------------------------+--------------+

opa eval --profile (enabled):

opa eval --profile --format=pretty --count=10 -b ./parameter-test-cases --read-ast-values 'data.policy.ingress'
{
  "allow": true
}
+--------------------------------+-----------+-----------+-----------------+-----------------+----------------+
|             METRIC             |    MIN    |    MAX    |      MEAN       |       90%       |      99%       |
+--------------------------------+-----------+-----------+-----------------+-----------------+----------------+
| timer_rego_data_parse_ns       | 79425122  | 87189335  | 8.11994413e+07  | 8.6604589e+07   | 8.7189335e+07  |
| timer_rego_external_resolve_ns | 250       | 542       | 341.4           | 533.5           | 542            |
| timer_rego_load_bundles_ns     | 114970667 | 136677792 | 1.190086584e+08 | 1.348652586e+08 | 1.36677792e+08 |
| timer_rego_module_compile_ns   | 30945583  | 35549292  | 3.26016542e+07  | 3.55404711e+07  | 3.5549292e+07  |
| timer_rego_module_parse_ns     | 6777125   | 11474796  | 8.177038e+06    | 1.14064044e+07  | 1.1474796e+07  |
| timer_rego_query_compile_ns    | 49791     | 71542     | 61137.5         | 71037.8         | 71542          |
| timer_rego_query_eval_ns       | 149458    | 195916    | 169574.9        | 194978.6        | 195916         |
| timer_rego_query_parse_ns      | 35834     | 61583     | 43250           | 61474.7         | 61583          |
+--------------------------------+-----------+-----------+-----------------+-----------------+----------------+
+----------+----------+----------+----------+----------+----------+----------+--------------+---------------------------------------------------------------------------------------+
|   MIN    |   MAX    |   MEAN   |   90%    |   99%    | NUM EVAL | NUM REDO | NUM GEN EXPR |                                       LOCATION                                        |
+----------+----------+----------+----------+----------+----------+----------+--------------+---------------------------------------------------------------------------------------+
| 42.167µs | 62.499µs | 49.799µs | 62.198µs | 62.499µs | 2        | 2        | 2            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:46 |
| 20.332µs | 27.542µs | 23.937µs | 27.542µs | 27.542µs | 1        | 1        | 1            | data.policy.ingress                                                                   |
| 17.5µs   | 24.416µs | 20.653µs | 24.303µs | 24.416µs | 3        | 3        | 3            | parameter-test-cases/policy/ingress/test.rego:16                                      |
| 10.876µs | 22.167µs | 14.275µs | 21.75µs  | 22.167µs | 1        | 1        | 1            | parameter-test-cases/policy/ingress/test.rego:28                                      |
| 9.292µs  | 16.625µs | 12.15µs  | 16.604µs | 16.625µs | 2        | 2        | 2            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:77 |
| 8.333µs  | 11.375µs | 9.628µs  | 11.325µs | 11.375µs | 1        | 1        | 1            | parameter-test-cases/policy/ingress/test.rego:36                                      |
| 7.458µs  | 11.333µs | 8.749µs  | 11.241µs | 11.333µs | 2        | 2        | 2            | parameter-test-cases/global/library/authorization/app_roles/app_roles.rego:26         |
| 5.125µs  | 9.958µs  | 6.791µs  | 9.745µs  | 9.958µs  | 1        | 0        | 1            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:35 |
| 3.625µs  | 6.834µs  | 4.541µs  | 6.738µs  | 6.834µs  | 1        | 1        | 1            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:47 |
| 3.292µs  | 3.958µs  | 3.612µs  | 3.945µs  | 3.958µs  | 1        | 1        | 1            | parameter-test-cases/global/library/authorization/app_roles_internal/internal.rego:68 |
+----------+----------+----------+----------+----------+----------+----------+--------------+---------------------------------------------------------------------------------------+

I'll collect some measurements for bundle updates later (aiming for tomorrow).

@@ -59,9 +59,25 @@ func metadataPath(name string) storage.Path {
return append(BundlesBasePath, name, "manifest", "metadata")
}

func read(ctx context.Context, store storage.Store, txn storage.Transaction, path storage.Path) (interface{}, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be an optimization would be to avoid the conversions in this package by adding support for AST-based logic. This probably is a big enough change that can be worked on in the future if needed.

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @johanfylling if we add some documentation about this that would be great.

Signed-off-by: Johan Fylling <[email protected]>
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working on this @johanfylling!

@johanfylling johanfylling merged commit 6af5e79 into open-policy-agent:main Oct 30, 2024
28 checks passed
@johanfylling johanfylling deleted the storage/eager-ast branch October 31, 2024 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants