Inquiry about TLSH score threshold (120) for file similarity #133

yoobato · 2024-11-10T16:50:28Z

Have a question regarding the use of TLSH for file comparison.

Related code

Line 109 - 113 in src/fosslight_binary/_binary_dao.py

tlsh_diff = tlsh.diff(row, tlsh_value)
if tlsh_diff <= 120:  # MATCHED
    if (matched_tlsh_diff < 0) or (tlsh_diff < matched_tlsh_diff):
        matched_tlsh_diff = tlsh_diff
        matched_tlsh = row

I've noticed that FOSSLight treats two files as the same if their TLSH score (distance) is 120 or less.

I'm curious about the rationale behind choosing 120 as the threshold for file similarity.
Could you please provide some insight into how this particular value ware determined?

I know that lower score means more similar, but I couldn't find any specific standard number neither TLSH web page nor the tech paper from Trend Micro.

Was it based on empirical testing, or other considerations?

Thank you for the amazing binary scanner!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry about TLSH score threshold (120) for file similarity #133

Inquiry about TLSH score threshold (120) for file similarity #133

yoobato commented Nov 10, 2024

Inquiry about TLSH score threshold (120) for file similarity #133

Inquiry about TLSH score threshold (120) for file similarity #133

Comments

yoobato commented Nov 10, 2024

Related code