Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about TLSH score threshold (120) for file similarity #133

Open
yoobato opened this issue Nov 10, 2024 · 0 comments
Open

Inquiry about TLSH score threshold (120) for file similarity #133

yoobato opened this issue Nov 10, 2024 · 0 comments

Comments

@yoobato
Copy link

yoobato commented Nov 10, 2024

Have a question regarding the use of TLSH for file comparison.

Related code

Line 109 - 113 in src/fosslight_binary/_binary_dao.py

tlsh_diff = tlsh.diff(row, tlsh_value)
if tlsh_diff <= 120:  # MATCHED
    if (matched_tlsh_diff < 0) or (tlsh_diff < matched_tlsh_diff):
        matched_tlsh_diff = tlsh_diff
        matched_tlsh = row

I've noticed that FOSSLight treats two files as the same if their TLSH score (distance) is 120 or less.

I'm curious about the rationale behind choosing 120 as the threshold for file similarity.
Could you please provide some insight into how this particular value ware determined?

I know that lower score means more similar, but I couldn't find any specific standard number neither TLSH web page nor the tech paper from Trend Micro.

Was it based on empirical testing, or other considerations?

Thank you for the amazing binary scanner!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant