Consider guaranteeing vendor list sort order #162

clbarnes · 2024-07-24T09:43:55Z

The vendor consent vecs will almost always be used for membership checks. If encoding is ever added to this crate, guaranteeing ascending order and non-repeats through modification makes that much easier. You'd hope that people would encode their TC strings as sorted (always true for a bitfield, but possibly not true for ranges), but they might not, so it would be good to guarantee it.

Using a BTreeSet instead of a Vec guarantees ascending order and uniqueness, even if people modify it for re-encoding later. Otherwise, a sorted vec is an improvement as it allows binary searches. An &mut self method to sort all the internal vendor lists would be minimally invasive.

The text was updated successfully, but these errors were encountered:

friedemannsommer · 2024-07-25T16:55:51Z

Hello @clbarnes,

There are currently no plans to implement encoding for this crate.
We’re open for contributions if you (or anyone else reading this) want(s) to implement TCString encoding for this crate.

I agree with you that bitfield lists should be sorted, but I (personally) would prefer a solution which is optional either via an option or crate feature.
With the current implementation it is not possible to pass options, be it a callback function that can be used to sort the bitfield lists or an option to signal the implementation, it should sort these lists.
So it would require a rewrite of the user-facing API as well.

clbarnes · 2024-07-25T17:04:50Z

The sorting was a bit of an aside - switching out the vec of vendor IDs for a btreeset would coincidentally guarantee order while primarily making containment checks more efficient (order being a useful tiebreaker vs a hashset). Containment checks are the entire purpose of vendor list so it seems like a use case worth prioritising. And serde serialises them exactly the same way so no changes needed there!

I only mentioned sorting as well because in a read-only workflow as currently supported, a sorted vec is probably about as good as a btreeset because you can binary search it.

friedemannsommer · 2024-07-26T11:18:56Z

The overhead for using the BTreeSet is probably also beneficial for the allowed_vendors and disclosed_vendors not just the vendors_consent and vendors_li_consent.

I will have to do some testing whether this overhead is worth placing behind an option / feature or if it should be the new default behavior.

friedemannsommer · 2024-08-22T15:05:52Z

The overhead for using BTreeSet is roughly 39%.

baseline (without BTreeSet)

TCString V2 (core only) time:   [346.08 ns 346.62 ns 347.23 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

TCString V2 (core + disclosed vendors)
                        time:   [1.5183 µs 1.5202 µs 1.5225 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors)
                        time:   [2.9701 µs 3.0044 µs 3.0392 µs]
Found 24 outliers among 100 measurements (24.00%)
  21 (21.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high severe

TCString V2 (core + publisher tc)
                        time:   [574.17 ns 574.92 ns 575.73 ns]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors + publisher tc)
                        time:   [779.89 ns 780.60 ns 781.31 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

with BTreeSet

TCString V2 (core only) time:   [361.11 ns 361.55 ns 362.04 ns]
                        change: [+4.4258% +4.7797% +5.1308%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

TCString V2 (core + disclosed vendors)
                        time:   [2.2539 µs 2.2557 µs 2.2576 µs]
                        change: [+48.396% +48.672% +48.959%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors)
                        time:   [5.0106 µs 5.0166 µs 5.0229 µs]
                        change: [+65.906% +67.190% +68.548%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) high mild
  18 (18.00%) high severe

TCString V2 (core + publisher tc)
                        time:   [715.91 ns 717.07 ns 718.23 ns]
                        change: [+24.120% +24.467% +24.861%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

TCString V2 (core + disclosed vendors + allowed vendors + publisher tc)
                        time:   [1.1880 µs 1.1980 µs 1.2080 µs]
                        change: [+51.470% +52.049% +52.751%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) high mild
  10 (10.00%) high severe

So this would need to be placed behind an opt-in feature.
The code used in the latter benchmark can be found here: 09a54c9

friedemannsommer self-assigned this Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider guaranteeing vendor list sort order #162

Consider guaranteeing vendor list sort order #162

clbarnes commented Jul 24, 2024

friedemannsommer commented Jul 25, 2024

clbarnes commented Jul 25, 2024

friedemannsommer commented Jul 26, 2024 •

edited

Loading

friedemannsommer commented Aug 22, 2024

Consider guaranteeing vendor list sort order #162

Consider guaranteeing vendor list sort order #162

Comments

clbarnes commented Jul 24, 2024

friedemannsommer commented Jul 25, 2024

clbarnes commented Jul 25, 2024

friedemannsommer commented Jul 26, 2024 • edited Loading

friedemannsommer commented Aug 22, 2024

friedemannsommer commented Jul 26, 2024 •

edited

Loading