Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider guaranteeing vendor list sort order #162

Open
clbarnes opened this issue Jul 24, 2024 · 4 comments
Open

Consider guaranteeing vendor list sort order #162

clbarnes opened this issue Jul 24, 2024 · 4 comments
Assignees

Comments

@clbarnes
Copy link

The vendor consent vecs will almost always be used for membership checks. If encoding is ever added to this crate, guaranteeing ascending order and non-repeats through modification makes that much easier. You'd hope that people would encode their TC strings as sorted (always true for a bitfield, but possibly not true for ranges), but they might not, so it would be good to guarantee it.

Using a BTreeSet instead of a Vec guarantees ascending order and uniqueness, even if people modify it for re-encoding later. Otherwise, a sorted vec is an improvement as it allows binary searches. An &mut self method to sort all the internal vendor lists would be minimally invasive.

@friedemannsommer
Copy link
Member

Hello @clbarnes,

There are currently no plans to implement encoding for this crate.
We’re open for contributions if you (or anyone else reading this) want(s) to implement TCString encoding for this crate.

I agree with you that bitfield lists should be sorted, but I (personally) would prefer a solution which is optional either via an option or crate feature.
With the current implementation it is not possible to pass options, be it a callback function that can be used to sort the bitfield lists or an option to signal the implementation, it should sort these lists.
So it would require a rewrite of the user-facing API as well.

@clbarnes
Copy link
Author

The sorting was a bit of an aside - switching out the vec of vendor IDs for a btreeset would coincidentally guarantee order while primarily making containment checks more efficient (order being a useful tiebreaker vs a hashset). Containment checks are the entire purpose of vendor list so it seems like a use case worth prioritising. And serde serialises them exactly the same way so no changes needed there!

I only mentioned sorting as well because in a read-only workflow as currently supported, a sorted vec is probably about as good as a btreeset because you can binary search it.

@friedemannsommer friedemannsommer self-assigned this Jul 26, 2024
@friedemannsommer
Copy link
Member

friedemannsommer commented Jul 26, 2024

The overhead for using the BTreeSet is probably also beneficial for the allowed_vendors and disclosed_vendors not just the vendors_consent and vendors_li_consent.

I will have to do some testing whether this overhead is worth placing behind an option / feature or if it should be the new default behavior.

@friedemannsommer
Copy link
Member

The overhead for using BTreeSet is roughly 39%.

baseline (without BTreeSet)
TCString V2 (core only) time:   [346.08 ns 346.62 ns 347.23 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

TCString V2 (core + disclosed vendors)
                        time:   [1.5183 µs 1.5202 µs 1.5225 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors)
                        time:   [2.9701 µs 3.0044 µs 3.0392 µs]
Found 24 outliers among 100 measurements (24.00%)
  21 (21.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high severe

TCString V2 (core + publisher tc)
                        time:   [574.17 ns 574.92 ns 575.73 ns]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors + publisher tc)
                        time:   [779.89 ns 780.60 ns 781.31 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
with BTreeSet
TCString V2 (core only) time:   [361.11 ns 361.55 ns 362.04 ns]
                        change: [+4.4258% +4.7797% +5.1308%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

TCString V2 (core + disclosed vendors)
                        time:   [2.2539 µs 2.2557 µs 2.2576 µs]
                        change: [+48.396% +48.672% +48.959%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors)
                        time:   [5.0106 µs 5.0166 µs 5.0229 µs]
                        change: [+65.906% +67.190% +68.548%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) high mild
  18 (18.00%) high severe

TCString V2 (core + publisher tc)
                        time:   [715.91 ns 717.07 ns 718.23 ns]
                        change: [+24.120% +24.467% +24.861%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors + publisher tc)
                        time:   [1.1880 µs 1.1980 µs 1.2080 µs]
                        change: [+51.470% +52.049% +52.751%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) high mild
  10 (10.00%) high severe

So this would need to be placed behind an opt-in feature.
The code used in the latter benchmark can be found here: 09a54c9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants