Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy hitter estimation is broken when there are multiple flows of the same size #3

Open
cppascalinux opened this issue Jan 23, 2023 · 0 comments

Comments

@cppascalinux
Copy link

cppascalinux commented Jan 23, 2023

When getting the ground truth for heavy hitter, getHeavyHitter in data.h simply takes the largest $K$ flows, and use the size of the $K$'th flow as the threshold $thr$. However there might be multiple flows of the same size as the $K$'th one, and when testing, we ask the sketch to retrieve all flows of size $\ge thr$, which will definitely result in a drop in the recall rate. This effect is significant when the flow size are distributed evenly.

A possible fix could be including all flows of size $\ge thr$ when calculating the ground truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant