Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a regex benchmark for comparison #23

Merged
merged 1 commit into from
Dec 22, 2023
Merged

Add a regex benchmark for comparison #23

merged 1 commit into from
Dec 22, 2023

Conversation

gavlyukovskiy
Copy link
Collaborator

@gavlyukovskiy gavlyukovskiy commented Dec 20, 2023

Had a thought to check how bad regex is doing, I'm honestly surprised it's doing quite okay (only 120-200 times slower than json-masker) 😄

Benchmark                                                             (jsonSize)  (maskedKeyProbability)  (obfuscationLength)   Mode  Cnt        Score   Error  Units
JsonMaskerBenchmark.baselineCountBytes         1kb                    0.01                 none  thrpt       3196601,040          ops/s
JsonMaskerBenchmark.baselineCountBytes       128kb                    0.01                 none  thrpt         23401,364          ops/s
JsonMaskerBenchmark.baselineCountBytes         2mb                    0.01                 none  thrpt          1462,554          ops/s
JsonMaskerBenchmark.jacksonString              1kb                    0.01                 none  thrpt         91196,798          ops/s
JsonMaskerBenchmark.jacksonString            128kb                    0.01                 none  thrpt           491,485          ops/s
JsonMaskerBenchmark.jacksonString              2mb                    0.01                 none  thrpt            18,792          ops/s
JsonMaskerBenchmark.jsonMaskerBytes            1kb                    0.01                 none  thrpt        983466,058          ops/s
JsonMaskerBenchmark.jsonMaskerBytes          128kb                    0.01                 none  thrpt          5229,711          ops/s
JsonMaskerBenchmark.jsonMaskerBytes            2mb                    0.01                 none  thrpt           297,844          ops/s
JsonMaskerBenchmark.jsonMaskerString           1kb                    0.01                 none  thrpt        824710,487          ops/s
JsonMaskerBenchmark.jsonMaskerString         128kb                    0.01                 none  thrpt          3697,972          ops/s
JsonMaskerBenchmark.jsonMaskerString           2mb                    0.01                 none  thrpt           279,800          ops/s
JsonMaskerBenchmark.regexList                  1kb                    0.01                 none  thrpt         40465,063          ops/s
JsonMaskerBenchmark.regexList                128kb                    0.01                 none  thrpt            42,897          ops/s
JsonMaskerBenchmark.regexList                  2mb                    0.01                 none  thrpt             2,307          ops/s

@gavlyukovskiy gavlyukovskiy marked this pull request as draft December 20, 2023 02:36
@gavlyukovskiy
Copy link
Collaborator Author

Need to redo benchmarks on my PC to make sure values are consistent

@Breus
Copy link
Owner

Breus commented Dec 20, 2023

@gavlyukovskiy 120-200 times slower with no support for array and object masking which are by far the most expensive parts of masking 😄

…, added parameter with character set to test how well json-masker is handling resizes
Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@gavlyukovskiy gavlyukovskiy marked this pull request as ready for review December 21, 2023 00:06
@gavlyukovskiy
Copy link
Collaborator Author

gavlyukovskiy commented Dec 21, 2023

@Breus I've also added benchmark parameter that uses different character sets and results are quite bad - every time we do any array resize (have quotes, use unicode or obfuscation length) then performance drops below regexes for a large json :(

@Breus
Copy link
Owner

Breus commented Dec 21, 2023

@gavlyukovskiy we could add an "ultra-performant" configuration which doesn't resize at all for escaped quotes and unicode characters but replaces them by the same length mask in bytes. This could also be something we add in a later version of the library.

For the rest: The fact that obfuscation slows down the algorithm and is slower than a regex without that seems logical and not something to address: if you would do the same with a regex/jackson, it would quite likely be much slower than this library.

@Breus Breus merged commit ddfd279 into master Dec 22, 2023
3 checks passed
@Breus Breus deleted the add-regex-benchmark branch December 22, 2023 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants